Skip to main content

Investigating the characteristics of genes and variants associated with self-reported hearing difficulty in older adults in the UK Biobank



Age-related hearing loss is a common, heterogeneous disease with a strong genetic component. More than 100 loci have been reported to be involved in human hearing impairment to date, but most of the genes underlying human adult-onset hearing loss remain unknown. Most genetic studies have focussed on very rare variants (such as family studies and patient cohort screens) or very common variants (genome-wide association studies). However, the contribution of variants present in the human population at intermediate frequencies is hard to quantify using these methods, and as a result, the landscape of variation associated with adult-onset hearing loss remains largely unknown.


Here we present a study based on exome sequencing and self-reported hearing difficulty in the UK Biobank, a large-scale biomedical database. We have carried out variant load analyses using different minor allele frequency and impact filters, and compared the resulting gene lists to a manually curated list of nearly 700 genes known to be involved in hearing in humans and/or mice. An allele frequency cutoff of 0.1, combined with a high predicted variant impact, was found to be the most effective filter setting for our analysis. We also found that separating the participants by sex produced markedly different gene lists. The gene lists obtained were investigated using gene ontology annotation, functional prioritisation and expression analysis, and this identified good candidates for further study.


Our results suggest that relatively common as well as rare variants with a high predicted impact contribute to age-related hearing impairment and that the genetic contributions to adult hearing difficulty may differ between the sexes. Our manually curated list of deafness genes is a useful resource for candidate gene prioritisation in hearing loss.


Hearing impairment is one of the most common sensory deficits in the human population and has a strong genetic component. However, the auditory system is a complex system with many interacting parts, which offers many routes to loss of function. Accordingly, although over 150 genes have been identified as contributing to non-syndromic human hearing loss [1], the majority of genes involved in hearing remain unknown. Moreover, most of the genes identified so far are those where mutations result in early-onset, severe hearing loss. While age-related hearing loss (ARHL) is very common, it is also very heterogeneous, and the associated landscape of genetic variation remains unclear, both at the gene and at the variant level. Even when analysing rare variants in known deafness genes, a wide mutational spectrum can be observed, with a range of allele frequencies and predicted impacts which differ on a gene-by-gene basis [2].

As early as 1997, it was noted that single-gene mutations can lead to early postnatal or adult-onset progressive hearing loss [3]. This remains the case 25 years later; 45 out of the 51 known human autosomal dominant deafness genes result in progressive hearing loss when mutated [1]. These mutations are rare, high-impact variants which have been identified through family studies and candidate gene screening of patient cohorts, for example [4,5,6,7]. However, most such variants are ultra-rare or even private [5], and while they fully explain the hearing loss seen in the affected individual or family, they cannot explain all the ARHL seen in the population. On the other end of the scale, looking at common variants, very large genome-wide association studies (GWAS) have recently uncovered several new loci [8, 9], but because GWAS work by identifying markers linked to disease loci, they cannot detect recent mutations or those which are not widespread throughout the population. A recent GWAS on hearing loss (made available as a preprint), which reports both common and rare variant association analyses, found that the rare variant association signals were mostly independent of the common variant associations nearby, confirming that it is important to include consideration of rare variants in the genetic landscape of ARHL [10].

Alternative approaches are therefore required to identify novel variants and genes associated with age-related hearing loss. Here we have investigated variants associated with self-reported hearing difficulty in 94,312 UK Biobank participants with available exome sequence data. We have assessed variant load in self-reported hearing difficulty at a range of variant minor allele frequencies, from rare variants (minor allele frequency (MAF) < 0.005) to very common variants, and compared the resulting gene lists to a much larger list of known deafness genes that we have curated and present here, based on work in mice as well as in humans. We found the optimal MAF cutoff to be 0.1, which is an intermediate frequency, neither rare nor common. Our results suggest that intermediate frequency variants with a high predicted impact contribute to hearing difficulty, and also that the genetic contributions to hearing impairment may differ between the sexes.


After filtering, in the normal hearing group there were 18,235 male (average age = 62.37 years) and 30,496 female (average age = 62.20 years) participants (48,731 people in total, overall average age 62.29). In the hearing difficulty group, there were 24,237 male (average age = 63.60 years) and 21,344 female (average age = 62.96 years) participants (45,581 people in total, overall average age 63.28). It is notable that while the overall group sizes are similar, there are many more female participants than male in the normal hearing group, reflecting the better hearing that women have later in life [11], although the average age of the participants (62–63 years) is later than the average onset of menopause, after which hearing tends to decline rapidly [12]. When we plotted the distribution of each broad ethnic grouping within each category (Additional file 2: Table S1, Fig S1), we found that there were many more Black people in the normal hearing group than in the hearing loss group (especially Black female participants) (Additional file 1: Fig S1). Similar results have been noted in previous studies [13]. The distribution of other self-reported ethnicities were broadly similar across the sex-separated groups, but it is notable that the largest difference in self-reported hearing phenotype between the sexes is in the White ethnic grouping (Additional file 1: Fig S1).

Outlier analysis of variant load

Outlier analyses of the genomic variant loads per gene in each group of people were carried out. Briefly, for each gene, the number of variants in people with hearing difficulty was compared to the number of variants in people with normal hearing using a linear regression. Each regression analysis resulted in two lists of outlier genes; those with a much higher variant load than expected (high variant load in hearing difficulty) and those with a much lower variant load than expected (which means a high variant load in normal hearing) (Fig 1., Additional file 2: Table S2). These lists were analysed to assess the effect of allele frequency and impact setting.

Fig. 1
figure 1

Comparison of variant load per gene for high-impact variants (MAF < 0.1). Each point represents a gene. Outliers are marked in orange (for higher load in participants with hearing difficulty) or blue (for higher load in participants with normal hearing). A shows all the data, including TTN and FBLN7, genes with a much higher variant count than all the others, and B shows the data without those two genes

We tested different minor allele frequency limits to determine the optimal cutoff. We carried out regression analyses for six MAF cutoffs (0.005, 0.01, 0.05, 0.1, 0.2 and 1), and obtained the lists of outlier genes, those genes with more variants than expected in hearing difficulty or in normal hearing (Table 1, A). To assess the potential biological relevance of these high variant load gene lists to hearing impairment, genes associated with deafness in humans and/or mice were compared with genes in the two outlier lists using our own manually curated list of known deafness genes (using the human orthologues of deafness genes known only in mice where possible, resulting in 720 genes in total) (Additional file 2: Table S3). Our assumption is that enrichment for known deafness genes supports biological relevance of the gene lists derived from the outlier analysis. We also compared the high variant load lists to our list of highly variable genes (Additional file 2: Table S4), genes which are often reported to have a high number of variants in sequencing projects. Our assumption is that enrichment for highly variable genes in the outlier gene lists is likely to reflect features unrelated to hearing. Hypergeometric tests were carried out to assess the significance of the number of deafness and highly variable genes in each outlier gene list.

Table 1 The number of genes, known deafness genes and highly variable genes in the high variant load lists at different minor allele frequencies and impacts

A, Gene counts for high variant load lists when using high-impact variants at different MAF cutoffs. B, Gene counts for high variant load lists when using low impact variants, MAF<0.1. C, Gene counts for high variant load lists with sex segregation, MAF<0.1. D, Gene counts for high variant load lists using the MSC cutoff, showing the counts for all participants and separated by sex, MAF<0.1. The p values for the deafness gene counts are the probability of observing at least that many deafness genes in the high variant load list given the overall gene list and the total number of deafness genes it contains. The same calculations were carried out with the highly variable gene list to obtain the probabilities of observing the highly variable gene counts. Hypergeometric p values were calculated using R and adjusted with a Bonferroni correction. Significant p values (adj. p<0.05) are indicated with a *.

As the MAF limit increased, the number of genes in the outlier lists also increased, as did the number of deafness and variable genes in each outlier list (Table 1, A). However, while the number of deafness genes in the high variant load in hearing difficulty outlier list was significant at every MAF limit (Table 1, A), the number of variable genes in the high variant load in hearing difficulty outlier lists did not reach significance until the MAF limit was set to 0.2 (20%). This suggests that a MAF of 0.1 is a good choice in order to obtain an outlier gene list which is relevant to hearing impairment and does not include too many highly variable genes. For the outlier genes with a high variant load in normal hearing, the MAF cutoffs where the number of deafness genes was significant (MAF < 0.1, MAF < 0.2) also resulted in a significant number of highly variable genes (Table 1, A).

The effect of including variants with a low impact (Additional file 2: Table S5) was then tested, using a MAF cutoff of 0.1, and we found that we did not obtain as many genes in the outlier lists, and only the number of highly variable genes in each outlier list was significant (Table 1), suggesting that relaxing the restriction on variant impact is likely to result in detecting naturally variable genes as outliers rather than genes linked to the phenotype under study. We therefore proceeded with analysing variants with a high impact and a minor allele frequency below 0.1. From these settings, we obtained 156 outlier genes with a high variant load in hearing difficulty, and 222 outlier genes with a high variant load in normal hearing (Fig. 1, Table 1, A).

Because a much higher proportion of the hearing difficulty group was male, the same analysis was carried out on participants separated by sex. The numbers of outlier genes in each case were similar (Table 1, B), but the gene lists were markedly different. Twenty-five genes were present in both male and female hearing difficulty high variant load lists, including seven deafness genes (CLIC5, MYH14, COL9A3, ELMO3, FSCN2, GJB2, SLC26A5). Twenty-four genes were present in both normal hearing high variant load lists, including three deafness genes (POLG, GLI3, MYO3A) (Fig. 2, Additional file 2: Table S2). There was a significant enrichment in deafness genes in the normal hearing outlier lists in both sexes, and in the hearing difficulty outlier list in female participants (Table 1, C). There were no significant overlaps with the highly variable gene list (Table 1, C).

Fig. 2
figure 2

Venn diagrams showing the overlap of the outlier gene lists when looking at only male, only female or all participants (outliers with intermediate frequency variants with high impact). The known deafness genes in the intersection (7 in the outlier genes in hearing difficulty, 3 in the outlier genes in normal hearing) are labelled

We chose a stringent fixed cutoff for the CADD score of 25, but it is unlikely that a single cutoff will be uniformly accurate for every gene. The mutation significance cutoff (MSC) is a gene-specific cutoff value which uses data from HGMD and ClinVar [14]. Because most genes do not have sufficient high-quality mutations described in these databases, the outlier analysis was repeated on the 2947 genes which did (MAF<0.1). We found 27 genes with a high variant load in normal hearing and 11 with a high variant load in hearing difficulty in all participants (Table 1, D). Numbers in the sex-separated analyses were lower, and only the highly variable genes showed significant enrichment, in a subset of the lists (Table 1, D).

Characteristics of the variants in the high variant load lists

The large numbers of outlier genes in people with normal hearing was unexpected, so we asked if there might be different types of variants common in hearing difficulty compared with normal hearing. We investigated the characteristics of the variants in the high variant load lists, taking the most deleterious consequence for each variant in each gene (defined in order in Additional file 2: Table S5). Variant counts were normalised per person and per gene. We did not see any large differences in variant type (Additional file 1: Fig S2). In all analyses, missense variants made up a large proportion of the total variant counts per person per gene.

Weighted burden tests

Weighted burden analyses were carried out on the variants with MAF < 0.1 and a high predicted impact, using the geneVarAssoc and scoreassoc tools, which have been used before on the ethnically heterogenous UK Biobank dataset [15]. Scoreassoc assigns a score per subject, per gene, and tests for the difference in average scores of cases vs controls, obtaining a p value for each gene. Variants were weighted by minor allele frequency (the lower the MAF, the higher the weight), but not by impact, since all variants included in this analysis were high impact. After correcting for multiple tests, none of the genes retained significant p values. We therefore ranked the gene list by signed log P value (SLP) [16]. The SLP is the log10 of the p value, with a positive sign indicating that cases have more variants than controls, and a negative sign indicating the opposite. Thus, ranking the genes by SLP would result in one extreme of the list being genes with more variants in the people who did not report hearing difficulty (n = 270 genes with SLP < −2), and the other extreme being genes with more variants in the people who did report difficulty hearing (n = 362 genes with SLP > 2) (Additional file 2: Table S2). We carried out hypergeometric tests on these gene lists, comparing the total number of genes with the number of deafness and highly variable genes (Additional file 2: Tables S3 and S4), and found no significant enrichment of either deafness or variable genes.

Gene ontology enrichment analysis

In order to look for any clues to pathological mechanisms, we carried out a gene ontology (GO) enrichment analysis using gProfiler [17] on the high variant load lists from the outlier analysis (Additional file 2: Table S2; high-impact variants with MAF<0.1). We restricted the output to GO terms with between 5 and 200 genes, since terms with more genes than that are overly general, and those with fewer genes are too specific. We found 33 GO terms enriched in the lists, including multiple terms specific to hearing (e.g. GO:0007605; sensory perception of sound) (Additional file 2: Table S6). The largest list of GO terms came from the genes with a high variant load in male participants with normal hearing (Additional file 2: Table S6), mostly because of a set of genes identified as being involved in stereocilium structure and function. There were 7 genes annotated with the term “stereocilium bundle”; USH1C, USH2A, MYO3A, TMC2, ADGRV1, PDZD7 and PKHD1L1. Most are known human deafness genes, but PKHD1L1 and TMC2 have only been identified as mouse deafness genes to date [18, 19]. Far fewer specific GO terms were identified from the genes with a high variant load in female participants with hearing difficulty or with normal hearing (Additional file 2: Table S6), even though there was a similar number of known deafness genes in the lists (Table 1). The term “sensory perception of sound” was identified as enriched in the hearing difficulty outlier genes in female and all participants, and in the normal hearing outlier genes in male participants (Additional file 2: Table S6). Genes annotated with this term which had a high variant load in female participants with hearing impairment were MYO3B, MYH14, CDH23, CLIC5, CHRNA10, FBXO11, TMC1, GJB2, NAV2, LOXHD1, SLC26A5 and MYO6. Genes annotated with this term which had a high variant load in all participants with hearing loss were COL11A1, MYH14, CDH23, CLIC5, CHRNA10, FBXO11, TMC1, GJB2, LOXHD1, SLC26A5 and MYO6. Most of these are known human deafness genes, but to date, FBXO11 has only been identified in the mouse, not in humans (Additional file 2: Table S3), and MYO3B and CHRNA10 are not in our compiled list of known deafness genes (they are included in the GO term annotation through orthologous similarity rather than published evidence). In summary, the GO term analysis showed enrichment for terms relating to sensory hair cells or cytoskeletal elements known to be important to hair cell function, and this appears to be driven by the enrichment for known deafness genes in the lists analysed. Many more genes were included in the high variant load lists than were described by GO annotation terms, reflecting the limitations in current GO annotations of many genes.

Gene prioritisation

The lists of genes of interest from the outlier analyses contained many genes not previously associated with hearing impairment, too many to follow up in detail. Therefore, ToppGene [20] was used to prioritise the genes from the high variant load lists, using our manually curated deafness gene list (Additional file 2: Table S3) as a training list. The remaining genes in each high variant load list were scored, ranked and assigned p values; after correction for multiple testing, we obtained eight genes from the hearing difficulty lists (NTRK1, TGFBR1, CACNA1S, P2RX7, MYLK, TTN, CACNB3 and ITGB1) and three from the normal hearing lists (NRG1, CACNA1H and FLNA) (Additional file 2: Table S7).

Using expression analysis to highlight new candidate genes

An analysis of the expression of candidate genes from our outlier analyses was carried out using single-cell RNAseq datasets from the gEAR database of mouse inner ear tissue analyses [21]. We reasoned that if a gene shows strong, specific expression in certain cell types in the inner ear, that suggests a potential functional role for the gene in those cell types and would make it a good candidate for further investigation. Mouse datasets were chosen to cover as many inner ear cell types as possible between embryonic day (E) 16 and postnatal day (P) 35. We selected those candidate genes from the outlier analysis high variant load lists (Additional file 2: Table S2; high-impact variants with MAF<0.1) which had a high-quality one-to-one mouse orthologue (n = 564), and we also plotted data for the genes known to underlie deafness in both mice and humans which were not already included (an additional 99 genes) as useful markers of cochlear cell types. After selecting those genes showing high variance in expression across the cell types, there were 234 genes from the high variant load lists and 78 more from the known deafness gene list. From the resulting heatmap, clusters of genes were annotated according to their expression, defining high expression levels (red) as 2–3, middle expression levels (orange) as 1–2 and low expression levels as 0–1 (yellow) (Additional file 1: Fig S3, Additional file 2: Table S8). Gene clusters were linked to specific cell types if they showed high or middle expression specific to those cell types, and the clusters were classified by known marker genes for specific cell types where these were present (Additional file 1: Fig S3, Additional file 2: Table S8). This allowed us to identify good candidate genes from those outlier genes which were not already known deafness genes. For example, there were three genes which appeared to be strongly and specifically expressed in pillar cells, Col4a4 and Col4a3, which are known deafness genes [22], and Thsd7a, which is a gene with high variant loads in female and all participants with self-reported hearing difficulty (cluster 2K, Additional file 1: Fig S3). There were multiple clusters of genes strongly expressed in hair cells (clusters 1C, 1D, 1F, 1H, 1N, 1O, 1P, 1Q, 2E, 2F, 2G, 2I, 3B, 3H, Additional file 1: Fig S3), and candidate genes from this list included Strip2 and Brd4, which had high variant loads in male participants reporting hearing difficulty, and Vwa8 and Chrna10, which had high variant loads in female and all participants reporting hearing difficulty. Fewer genes were observed that were strongly and specifically expressed in the spiral ganglion neurons (SGN) and strial cell types, possibly because there were fewer datasets available in the gEAR database for these cell types compared to hair cells, pillar cells and supporting cells, but cluster 2D did show consistent SGN expression (Additional file 1: FigS3). Candidate genes from the SGN cluster include the mouse deafness gene Ercc6 [23], which has a high variant load in female participants with normal hearing, and Abr, which has a high variant load in all three groups with hearing difficulty. Some genes were found which clustered in strial cell types (marginal, intermediate and basal cells, clusters 1J, 1M, 3I, 4A, Additional file 1: Fig S3). All the genes plotted on the heatmap are listed with their classifications in Additional file 2: Table S8, and a summary of the clusters and their outlier genes is in Table 2. It is notable that outlier genes are found in all but two of the clusters, suggesting that there is no one cell type or cochlear location wholly or largely responsible for adult hearing difficulty in either sex (Fig. 3, Table 2).

Table 2. All the heatmap clusters which include outlier genes are listed with their expression description, their classification (if one had been assigned) and the outlier genes separated by whether they were from hearing difficulty outlier lists or normal hearing outlier lists. Thirteen genes are present in both a hearing difficulty outlier list and a normal hearing outlier list. Known deafness genes are in bold. RM: Reissner's membrane; OC: Organ of Corti; LW: lateral wall; SGN: spiral ganglion neurons; SC: supporting cell; HC: hair cells; OHC: outer hair cells; IHC: inner hair cells; IPhC: inner phalangeal cells; DC: Deiters' cells; PC: pillar cells
Fig. 3
figure 3

Schematic of the cochlear duct showing cell types (top left) and expression patterns based on the scRNAseq data downloaded from the gEAR database. The numbers show how many outlier genes were present in the cluster; “HD” for the number of outlier genes in hearing difficulty lists, “NH” for the number of outlier genes in normal hearing lists, and “Both” for where an outlier gene was present in a hearing difficulty list and a normal hearing list. See Table 2 for clusters and for gene names. RM=Reissner’s membrane; MC=marginal cells; IC=intermediate cells; BC=basal cells; RC=root cells; SpC = spindle cells; SGN=spiral ganglion neurons; IBC=inner border cells; IphC=inner phalangeal cells; IHC=inner hair cells; OHC=outer hair cells; HeC=Hensen cells; CC=cells of Claudius; IPC=inner pillar cells; OPC=outer pillar cells; DC=Deiters’ cells


Our data suggest that the genetic contributions to hearing difficulty in later life may differ between the sexes. That is, there may be some genetic impacts which have less effect on women than on men, and vice versa. The differences observed in the prevalence, severity and onset of ARHL in men and women have been widely reported (for example [11, 13, 24,25,26,27], reviewed in [28]). Sex differences in complex traits and disease phenotypes may be attributed to environmental factors (in this case, noise exposure would be relevant, and drug exposure, which can affect hearing in a sex-specific manner [29]), and comorbidities which display sex-related variance may also play a role, for example cardiovascular disease [30,31,32]. Endogenous factors are also likely to contribute, such as hormone differences, epigenetic and regulatory differences and, of course, the different genetics involved in the XX and XY genomes. There are many studies linking estrogen to hearing sensitivity [33,34,35], and several genes in the estrogen pathway have been linked to hearing loss [36,37,38]. However, the average age of our participants was 62–63 years old, which is later than the average age of onset of menopause, so it is unlikely that hormones alone account for the observed effect. A sex-protective effect has been observed in other diseases, such that one sex requires a greater number of risk alleles to develop the disease. This was originally described by Carter et al [39, 40], who noted that women are less likely to suffer from pyloric stenosis but more likely to have children affected by the disease, but the phenomenon can apply to either sex. From our data, we found a similar number of genes bore a high load of variants in each sex, but the gene lists themselves were very different. Hearing impairment, including age-related hearing loss, while referred to as one condition, is actually the end result of a wide range of inner ear pathologies, so it is plausible that different sets of risk alleles contribute to overall hearing impairment in different sexes.

From our outlier analyses, we found the most useful MAF cutoff to be 0.1. At this level, there were 156 genes with a high variant load in people who reported hearing difficulty, and this list was significantly enriched in deafness genes but not in highly variable genes (Table 1). Increasing the MAF cutoff resulted in a significant enrichment in highly variable genes, and reducing it reduced the number of genes in the list, although the enrichment in deafness genes remained significant at all the cutoffs we tested (Table 1). This suggests that relatively common variants (MAF < 0.1) with a high predicted impact do contribute to hearing impairment, which correlates with the findings of another recent UK Biobank study [10], which reports that 16.8% of SNP heritability is contributed by “low-frequency variants” (0.001 < MAF ≤ 0.05). This is lower than our chosen cutoff (MAF < 0.1), but still higher than the standard cutoff of 0.001 recommended for autosomal dominant hearing loss [41]. Similarly, a recent report based on a very large GWAS meta-analysis also concluded that it is likely that a burden of common and rare impactful variants drives the risk of hearing loss [42]. Since ARHL is a complex disease rather than a Mendelian one, it is unsurprising that a different approach is needed when filtering for causative variants.

The burden tests we carried out did not identify any genes with a significant burden in cases (people reporting difficulty hearing) vs controls (people who did not report difficulty hearing). Burden tests compare the average of individual scores in cases and controls, while our outlier approach is an aggregated one, simply summing all the variants in cases and comparing them to the sum in controls. That has proved to be a useful approach for the UK Biobank cohort, which lacks all but the most basic auditory phenotyping data. In a cohort with more detailed auditory phenotyping, a burden test may prove to be a better approach. For example, Ivarsdottir et al. recently reported identifying the candidate gene AP1M2 using a loss-of-function gene-based burden test on data from a well-phenotyped Icelandic cohort [43].

Previous studies have concentrated on human hearing loss genes [2, 5], but we have compiled a larger list of nearly 700 genes based on human and mouse studies, and from this we have identified multiple candidate genes among our outliers, including FSCN2, SYNJ2, FBXO11, NAV2, TMC2, ERCC6 and PKHD1L1. Of the 185 known human deafness genes, 118 are also mouse deafness genes (Additional file 2: Table S3), suggesting that mouse deafness genes are indeed good candidate human deafness genes. This is supported by the report from Praveen et al. [10], who identified rare variant gene burdens in the mouse deafness genes KLHDC7B, FSCN2 and SYNJ2, the latter two of which were also identified in our analyses (Table 2, Additional file 2: Table S8).

We took three approaches to explore the outlier gene lists, GO analysis, ToppGene prioritisation and expression analysis. The GO analyses largely reiterated the comparisons with the deafness gene list (Additional file 2: Table S6). The lack of GO annotations linking the genes bearing a high variant load in hearing difficulty in the sex-separated analyses suggests that more pathways underlying hearing loss remain to be discovered and annotated. The ToppGene method is less constrained, because it uses more data sources as well as a training list to prioritise novel candidate genes, but it still relies on existing data and annotation to calculate scores and rankings. From our ToppGene prioritisation of all six high variant load lists, we obtained eleven candidate genes (Additional file 2: Table S7).

Our approach using the gEAR expression data is not limited by annotation, but is restricted to genes which have a high-quality one-to-one mouse orthologue, of which there were 564 (out of 674 outlier genes in total). It is also subject to ascertainment bias due to the relative lack of data on inner ear cell types which are not hair cells or supporting cells. We obtained 6 datasets from hair cells and supporting cells at different stages from E16 to P35, but only 2 datasets from cochlear lateral wall cell types at 2 adult ages (P20 and P30), and only one dataset from SGNs (P17-33). This means that any gene expressed during development in the lateral wall or SGNs, but not expressed in adult stages, will have been missed out of our heatmap. Additionally, most of the known deafness genes which we plotted on the heatmap are hair cell or supporting cell genes, and this may have biased the clustering. This may be why there are more outlier genes assigned to clusters with expression in hair cells (Fig. 3, most notably clusters 1D, 1E, 1N, 1O, 1P, 1Q, 2E, 3B and 3H). Despite that, we did observe several clusters with expression in the lateral wall and spiral ganglion (Fig. 3, Additional file 1: Fig S3). We have identified multiple potential candidate genes based on their presence in the outlier gene lists and their expression in specific cell types within the cochlea (Table 2), such as THSD7A, which is expressed in pillar cells, and PRUNE2, a gene with expression in the spiral ganglion neurons, both of which have a high variant load in hearing difficulty (female and all participants). Three of the ToppGene candidates were included in the gEAR heatmap; TGFBR1 and FLNA, which are mainly expressed in the organ of Corti, and MYLK, which is expressed in fibrocytes (Additional file 2: Table S8).

The nature of the regression analysis means that we detected outlier genes associated with normal hearing as well as with hearing impairment, and our enrichment analyses of the outlier gene lists (high-impact variants with MAF < 0.1) showed almost all of them were significantly enriched in deafness genes (outliers with a high variant load in male participants with hearing difficulty was the only exception) (Table 1). This suggests that the high variant loads are driven by the association with the self-reported hearing phenotype, not just statistical noise, sequencing error and the natural genetic variability observed in some genes, particularly large genes like TTN and USH2A. This includes the high variant loads associated with normal hearing as well as those associated with hearing impairment. It is possible that there may be protective variants in some of these genes, for example, variants which result in protection against noise trauma or ototoxic drug exposure, or which simply improve the maintenance of the inner ear machinery. Such a variant has recently been reported in GJB6 in mice; homozygotes for the deleterious Ala88Val mutation displayed better hearing at older ages, better neural output from the inner ear, and reduced hair cell loss [44]. This is not the only precedent for deleterious mutations having a beneficial impact on a phenotype, and such mutations may be attractive targets for drug development. For example, Akbari et al. recently reported that multiple rare protein-truncating variants in the gene GPR75 were associated with protection from obesity, and mice lacking the orthologue, Gpr75, were resistant to weight gain on a high-fat diet [45]. Similarly, rare deleterious variants in B4GALT1 have been linked to decreased coronary artery disease via reduction of fibrinogen and low-density lipoprotein cholesterol [46]. Further investigation of the genes and variants linked to normal hearing and sex differences in hearing loss is needed.

Previously reported GWAS of the UK Biobank which also used the self-reported hearing phenotype identified multiple overlapping loci; 71 in total, 19 of which were shared between all three studies (Fig. 4) [8, 9, 43]. Four of those 19 were also identified in our study (CHMP4C, NID2, SYNJ2 and CDH23). Five further genes from our outlier lists were shared between a subset of the GWAS lists; TMPRSS9, LOXHD1 and TUB were identified by Ivarsdottir et al. and Kalra et al., and SLC26A5 and FSCN2 by Ivarsdottir et al. alone. Those genes identified by multiple studies are obvious candidates for involvement in ARHL, but the differences in loci identified using the GWAS approach suggest there are many more to investigate, and the results of our exome sequencing analysis support that as well as suggesting further candidates (Table 2). It has recently been observed that rare variants do not account for the GWAS hits of common markers [10], so it is unsurprising to find different variants and different genes associated with ARHL in GWAS compared with exome/genome sequence analysis studies.

Fig. 4
figure 4

Comparison of gene lists from recent UK Biobank GWAS on self-reported hearing. Labelled genes are those also identified in this study and are not included in the numbers for those sections. Known deafness genes are in bold. The Wells et al. [9] and Kalra et al. [8] analyses used the UK Biobank data only while the Ivarsdottir et al. analysis [43] included other populations from Iceland

The biggest limitation of this study is the lack of measured hearing impairment (such as an audiogram) and detailed auditory phenotyping. Self-reported hearing difficulty has been shown to be sufficiently informative for general hearing capacity [9, 47], but there is more to hearing loss than just an average threshold shift. It is likely that we have missed mild or even moderate hearing loss, and also unilateral hearing loss. Most notably, we were not able to exclude participants who had experienced hearing impairment from a young age (with the exception of cochlear implant users, who were excluded). Being able to compare specific subtypes of true age-related hearing loss (for example, using a classification system such as the one described in [48,49,50]) offers the potential to link genes with a high variant load to specific inner ear pathologies, an important step for stratifying patient populations and developing therapeutics.


From this study, we have established that it is useful to include more common variants when investigating a heterogeneous disease such as adult hearing difficulty. In this case, we found the most useful MAF cutoff to be 0.1, but it is likely that this varies by condition. We also found that the genetic contributions to self-reported hearing difficulty differ between the sexes, suggesting that in future studies, it would be useful to separate study participants by sex, as well as analysing all participants together. Future studies would also benefit from more detailed auditory phenotyping data.

While these points are based on a study of adult self-reported hearing difficulties, it is likely that they apply to many other conditions. As the availability of large-scale exome and genome sequencing studies grows, it is important to explore questions which could not be asked using earlier paradigms such as genome-wide association studies. This work highlights several such avenues of exploration.


UK Biobank participant selection

UK Biobank (RRID:SCR_012815) is a large-scale biomedical database and research resource containing genetic, lifestyle and health information from half a million UK participants, aged between 40 and 69 years in 2006–2010, who were recruited from across the UK. Participants have consented to provide their data to approved researchers who are undertaking health-related research that is in the public interest. Participants were selected who were ≥55 years of age who had exome sequencing data available (200,619 exomes available in total, September 2020) and could be classified as having normal hearing or hearing difficulty, based on their self-report of hearing difficulty, hearing difficulty in noise, or use of a hearing aid. If people reported no hearing difficulties or hearing aid use at any assessment and had been asked about their hearing at least once when they were ≥55, we included them in the “normal hearing” group. If people reported consistent or worsening hearing impairment, or that they had at any point been a hearing aid user and had been asked at least once about their hearing when they were ≥55, we included them in the “hearing difficulty” group. Participants who reported otologic disorders (e.g. Meniere’s disease) were excluded. People who reported high levels of noise exposure or moderate/severe tinnitus were also excluded from the normal hearing group (Additional file 1: Figure S4). This resulted in a total of 48,731 people in the normal hearing group (18,235 male and 30,496 female participants), and 45,581 people in the hearing loss group (24,237 male and 21,344 female participants). Overall, 106,307 participants with exomes were excluded based on the above criteria. We did not filter by self-reported ethnicity. The vast majority (96%) of participants described themselves as “British,” “Irish,” “White,” or “any other White background,” or some combination thereof (hereafter referred to as White). In most of the broad ethnic groupings (Additional file 2: Table S1), there were more female than male volunteers (Additional file 1: Figure S1). Participants included in this study were compared to the entire UK Biobank, to the UK Biobank participants who had had exome sequencing, and to the data from the UK census 2011 [51], and we found that while the proportion of self-reported minority ethnicities was smaller in the UK Biobank than in the 2011 census [51], it was smaller still in the participants included in this study (Additional file 1: Fig S1). However, the distribution of self-reported ethnicities in the participants with exome sequencing reflected that of the entire Biobank (Additional file 1: Figure S1). The “healthy volunteer” effect, meaning that participants tend to be healthier in terms of lifestyle and health conditions, has been previously noted in the UK Biobank when compared to the UK 2011 census data, as has the greater proportion of people reporting their ethnicity as White [51]. It is not clear why the subset of the UK Biobank selected for this study, on the basis of their answers to questions about hearing and related issues, has an even greater proportion of participants who report their ethnicity as White.

Variant annotation and filtering

UK Biobank variant calls were made available following processing, variant calling and joint genotyping [52, 53], but without any filters applied at the sample or variant level. We annotated the variants using the Ensembl Variant Effect Predictor (RRID:SCR_007931) [54], including data from ReMM, which provides a measure of pathogenicity for regulatory variants [55], SpliceAI, which scores variants based on their predicted effect on splicing [56], Sutr, which provides annotations for 5′ UTR variants, including a predicted effect on translation efficiency [57], and the deleteriousness predictor CADD (RRID:SCR_018393) [58]. Minor allele frequencies were obtained from gnomAD (African, admixed American, Ashkenazi Jewish, East Asian, Finnish, Non-Finnish European, Other) (RRID:SCR_014964) [59], the 1000 Genomes project (African, admixed American, East Asian, European, South Asian) (RRID:SCR_006828) [60], TopMed (not divided by population) (RRID:SCR_015677) [61] or ESP6500 (African, European) [62], and the maximum reported minor allele frequency (MAF) was used. Variants were then filtered based on the overall quality of the variant call (QUAL, minimum 20) and the read depth (DP, minimum 10) and genotype quality (GQ, minimum 10) of individual calls. Variants with more than 10% of calls missing were also excluded, as were those which had a high private allele frequency within the UKBB cohort (defined as the recorded minor allele frequency + 0.4) [63]. In order to exclude variants exhibiting excess heterozygosity, we excluded variants which failed the Hardy-Weinberg equilibrium test and which had excess heterozygosity >0.1 (excess heterozygosity was calculated by (O − E)/E, where O is the observed heterozygote count and E is the expected heterozygote count).

Variant classification filters

Variants were filtered based on their minor allele frequency and a combination of pathogenicity and consequence filters (Table 3). We defined two levels of impact upon a gene product, low impact and high impact. High-impact variants were those in coding regions, intronic splice sites or mature miRNAs with a CADD score > 25 or a SpliceAI score > 0.5, and those in 5′ UTRs with a Sutr score > 1. Low impact variants were all those in coding regions, intronic splice sites, mature miRNAs and 5′ UTR regions, those in 3′ UTR regions, and any variants with other classifications (e.g. regulatory region variants) which had a ReMM score > 0.95 (see Additional file 2: Table S5 for the exact variant classification terms and filters). This is an inclusive classification, so the list of low-impact variants includes the high-impact variants.

Table 3 Classification criteria for variants by impact and minor allele frequency

Regression outlier analysis

For each analysis, we assessed 21,841 protein-coding genes and microRNAs (Additional file 2: Table S9). We summed the total number of variants in each gene in participants from the normal hearing group and compared them to the total number of variants in that gene from the hearing difficulty group using a linear regression. The residuals were obtained for each gene (the difference between the observed and predicted variant load in hearing impairment) and the first (Q1) and third (Q3) quartiles, and the interquartile distance (D, Q3-Q1), were calculated. Outlier genes with a high variant load in hearing difficulty were defined as those with residuals > Q3 + 6D, and outlier genes with a high variant load in normal hearing were defined as those with residuals < Q1 – 6D [64]. All participants were subjected to these comparisons, and we also carried out sex-separated analyses. Hypergeometric distribution tests were carried out using R, and gProfiler [17] was used to carry out a GO enrichment analysis of the outlier gene lists. The Bonferroni correction was used to adjust for multiple testing.

Burden tests

Weighted burden tests were carried out using the geneVarAssoc/scoreassoc software [16, 65], which has been shown to be capable of handling heterogeneous datasets [15]. Population principal components were derived from common variants using plink v2.0, following reading in of variants from vcf files using plink v1.9 (RRID:SCR_001757) [66]. For each gene, scoreassoc assigns scores to subjects according to the variants carried, assigning weights according to minor allele frequency, such that rarer variants are assigned a higher weight. The software then tests whether the average score for cases is higher than the score for controls [16]. The Bonferroni correction was used to adjust for multiple testing.

Compilation of the list of deafness genes

We compiled a manually curated list of known deafness genes in humans and mice, including all genes listed in the Hereditary Hearing Loss Homepage (RRID:SCR_006469) [1], and genes which, when mutated, result in altered hearing thresholds in mutant mice, reported by the International Mouse Phenotyping Consortium ( (RRID:SCR_006158) [67, 68]; average thresholds were individually checked for shifts >10 dB with small standard deviations). We also included mouse and human deafness genes described in the literature (for example [69, 70]; for full reference list see Additional file 2: Table S3). There were 118 genes shown to underlie hearing in mice and humans, 67 human deafness genes (with 66 mouse orthologues) and 506 mouse deafness genes (with 535 human orthologues) (Fig. 5, Additional file 2: Table S3). Although many of these known deafness genes have only been linked to early-onset, severe hearing impairment, they are still good candidates for involvement in milder hearing impairment, since different variants can result in very different phenotypes. For example, different variants in TMC1 have been shown to result in either prelingual profound hearing loss or postlingual progressive hearing loss [71, 72], and several recent large-scale studies looking at adult-onset hearing loss have found multiple missense variants in Mendelian deafness genes with milder effects than previously reported [5, 10, 43].

Fig. 5
figure 5

Deafness gene counts in mice and humans. Brackets indicate orthologues (e.g. there are 66 mouse orthologues of the 67 human deafness genes)

Compilation of the list of highly variable genes

Some genes are often reported as having a high number of variants in multiple exome sequencing projects. This can be because they are large genes (e.g. TTN), or because they belong to groups of paralogues such as olfactory receptors, which are sufficiently similar to make correct alignment difficult, resulting in incorrect variant calls. Two such lists were compiled by Adams et al. [73] and Fuentes Fajardo et al. [74] and consist of genes which contributed many variant calls to multiple exomes as well as human leukocyte antigen (HLA), taste receptor (TAS), olfactory receptor and mucin family genes. Additionally, some genes have been identified as prone to recurrent false positive calls, which are variants that did not validate with further genotyping and were not heritable [75]. We combined all three lists, resulting in 1213 genes in total (Additional file 2: Table S4).

Gene expression analysis using the gEAR

To assess the expression of lists of genes of interest in the inner ear, including the list of known mouse and human deafness genes, we used single-cell RNAseq data from the mouse inner ear, accessed via the gEAR portal ( (RRID:SCR_017467) [21]. We chose datasets from mice aged between embryonic day (E) 16 and postnatal day (P) 35. The datasets we used came from E16 cochlea, P1 cochlea, P7 cochlea [76, 77], P15 cochlea [78, 79], P20 inner ear [80, 81], P28-35 cochlea [82,83,84], P30 stria vascularis [85, 86] and P17-33 spiral ganglion neurons [87, 88]. Expression levels were normalised to Hprt expression; where Hprt was not present in the dataset, or had an expression level of 0, we did not use the data. We then summarised the data, taking the maximum level per cell type without accounting for age. Because the expression levels ranged from 0 to 70.6 (Ceacam16, in outer hair cells), we transformed the data such that levels between 10 and 100 were scaled to between 2 and 3, and levels between 1 and 10 were scaled to between 1 and 2. We used R to plot a heatmap of the genes that showed the most variability between cell types, suggestive of specific expression patterns rather than non-specific expression (n = 312, variance across datasets > 0.15), and to cluster cell types and genes. We further defined gene clusters first based on the R dendrograms and then on the gene expression levels within specific cell types or groups of cell types.

Availability of data and materials

All UK Biobank data, including the exome sequence data and questionnaire data used in this study, are publicly available to registered researchers through the UK Biobank data access protocol. Further information about registration may be found at The mouse single-cell RNAseq data used in this study are publicly available from the gEAR database at, and also from the GEO repository (,,,, GSE117055, GSE111347, GSE1113478) [76, 79, 81, 83,84,85, 88].



Age-related hearing loss


Minor allele frequency


Gene ontology


Genome-wide association study


  1. Van Camp G, Smith RJH. Hereditary Hearing Loss Homepage. Accessed Dec 2021.

  2. Azaiez H, Booth KT, Ephraim SS, Crone B, Black-Ziegelbein EA, Marini RJ, et al. Genomic landscape and mutational signatures of deafness-associated genes. Am J Hum Genet. 2018;103(4):484–97.

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Van Camp G, Willems PJ, Smith RJ. Nonsyndromic hearing impairment: unparalleled heterogeneity. Am J Hum Genet. 1997;60(4):758–64.

    PubMed  PubMed Central  Google Scholar 

  4. Mencia A, Modamio-Hoybjor S, Redshaw N, Morin M, Mayo-Merino F, Olavarrieta L, et al. Mutations in the seed region of human miR-96 are responsible for nonsyndromic progressive hearing loss. Nat Genet. 2009;41(5):609–13.

    CAS  PubMed  Google Scholar 

  5. Boucher S, Tai FWJ, Delmaghani S, Lelli A, Singh-Estivalet A, Dupont T, et al. Ultrarare heterozygous pathogenic variants of genes causing dominant forms of early-onset deafness underlie severe presbycusis. Proc Natl Acad Sci U S A. 2020;117(49):31278–89.

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Walsh T, Walsh V, Vreugde S, Hertzano R, Shahin H, Haika S, et al. From flies’ eyes to our ears: mutations in a human class III myosin cause progressive nonsyndromic hearing loss DFNB30. Proc Natl Acad Sci U S A. 2002;99(11):7518–23.

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Baek JI, Oh SK, Kim DB, Choi SY, Kim UK, Lee KY, et al. Targeted massive parallel sequencing: the effective detection of novel causative mutations associated with hearing loss in small families. Orphanet J Rare Dis. 2012;7:60.

    PubMed  PubMed Central  Google Scholar 

  8. Kalra G, Milon B, Casella AM, Herb BR, Humphries E, Song Y, et al. Biological insights from multi-omic analysis of 31 genomic risk loci for adult hearing difficulty. Plos Genet. 2020;16(9):e1009025.

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Wells HRR, Freidin MB, Zainul Abidin FN, Payton A, Dawes P, Munro KJ, et al. GWAS identifies 44 independent associated genomic loci for self-reported adult hearing difficulty in UK Biobank. Am J Hum Genet. 2019;105(4):788–802.

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Praveen K, Dobbyn L, Gurski L, Ayer AH, Staples J, Mishra S, et al. Population-scale analysis of common and rare genetic variation associated with hearing loss in adults. medRxiv. 2009;2021(2021):2027–21264091.

    Google Scholar 

  11. Cruickshanks KJ, Wiley TL, Tweed TS, Klein BE, Klein R, Mares-Perlman JA, et al. Prevalence of hearing loss in older adults in Beaver Dam, Wisconsin. The Epidemiology of Hearing Loss Study. Am J Epidemiol. 1998;148(9):879–86.

    CAS  PubMed  Google Scholar 

  12. Hederstierna C, Hultcrantz M, Collins A, Rosenhall U. The menopause triggers hearing decline in healthy women. Hear Res. 2010;259(1-2):31–5.

    CAS  PubMed  Google Scholar 

  13. Helzner EP, Cauley JA, Pratt SR, Wisniewski SR, Zmuda JM, Talbott EO, et al. Race and sex differences in age-related hearing loss: the Health, Aging and Body Composition Study. J Am Geriatr Soc. 2005;53(12):2119–27.

    PubMed  Google Scholar 

  14. Itan Y, Shang L, Boisson B, Ciancanelli MJ, Markle JG, Martinez-Barricarte R, et al. The mutation significance cutoff: gene-level thresholds for variant predictions. Nat Methods. 2016;13(2):109–10.

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Curtis D. Multiple linear regression allows weighted burden analysis of rare coding variants in an ethnically heterogeneous population. Hum Hered. 2020;85(1):1–10.

    CAS  PubMed  Google Scholar 

  16. Curtis D, Consortium UK. Practical experience of the application of a weighted burden test to whole exome sequence data for obesity and schizophrenia. Ann Hum Genet. 2016;80(1):38–49.

    Google Scholar 

  17. Raudvere U, Kolberg L, Kuzmin I, Arak T, Adler P, Peterson H, et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 2019;47(W1):W191–8.

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Wu X, Ivanchenko MV, Al Jandal H, Cicconet M, Indzhykulian AA, Corey DP. PKHD1L1 is a coat protein of hair-cell stereocilia and is required for normal hearing. Nat Commun. 2019;10(1):3801.

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Kawashima Y, Geleoc GS, Kurima K, Labay V, Lelli A, Asai Y, et al. Mechanotransduction in mouse inner ear hair cells requires transmembrane channel-like genes. J Clin Invest. 2011;121(12):4796–809.

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Chen J, Bardes EE, Aronow BJ, Jegga AG. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 2009;37(Web Server issue):W305–11.

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Orvis J, Gottfried B, Kancherla J, Adkins RS, Song Y, Dror AA, et al. gEAR: Gene Expression Analysis Resource portal for community-driven, multi-omic data exploration. Nat Methods. 2021;18(8):843–4.

    CAS  PubMed  Google Scholar 

  22. Mochizuki T, Lemmink HH, Mariyama M, Antignac C, Gubler MC, Pirson Y, et al. Identification of mutations in the alpha 3(IV) and alpha 4(IV) collagen genes in autosomal recessive Alport syndrome. Nat Genet. 1994;8(1):77–81.

    CAS  PubMed  Google Scholar 

  23. Nagtegaal AP, Rainey RN, van der Pluijm I, Brandt RM, van der Horst GT, Borst JG, et al. Cockayne syndrome group B (Csb) and group a (Csa) deficiencies predispose to hearing loss and cochlear hair cell degeneration in mice. J Neurosci. 2015;35(10):4280–6.

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Dubno JR, Lee FS, Matthews LJ, Ahlstrom JB, Horwitz AR, Mills JH. Longitudinal changes in speech recognition in older persons. J Acoust Soc Am. 2008;123(1):462–75.

    PubMed  Google Scholar 

  25. Dubno JR, Lee FS, Matthews LJ, Mills JH. Age-related and gender-related changes in monaural speech recognition. J Speech Lang Hear Res. 1997;40(2):444–52.

    CAS  PubMed  Google Scholar 

  26. Lee FS, Matthews LJ, Dubno JR, Mills JH. Longitudinal study of pure-tone thresholds in older persons. Ear Hear. 2005;26(1):1–11.

    PubMed  Google Scholar 

  27. Pearson JD, Morrell CH, Gordon-Salant S, Brant LJ, Metter EJ, Klein LL, et al. Gender differences in a longitudinal study of age-associated hearing loss. J Acoust Soc Am. 1995;97(2):1196–205.

    CAS  PubMed  Google Scholar 

  28. Nolan LS. Age-related hearing loss: why we need to think about sex as a biological variable. J Neurosci Res. 2020;98(9):1705–20.

    CAS  PubMed  Google Scholar 

  29. Mills JH, Matthews LJ, Lee FS, Dubno JR, Schulte BA, Weber PC. Gender-specific effects of drugs on hearing levels of older persons. Ann N Y Acad Sci. 1999;884:381–8.

    CAS  PubMed  Google Scholar 

  30. Gates GA, Cobb JL, D'Agostino RB, Wolf PA. The relation of hearing in the elderly to the presence of cardiovascular disease and cardiovascular risk factors. Arch Otolaryngol Head Neck Surg. 1993;119(2):156–61.

    CAS  PubMed  Google Scholar 

  31. Tan HE, Lan NSR, Knuiman MW, Divitini ML, Swanepoel DW, Hunter M, et al. Associations between cardiovascular disease and its risk factors with hearing loss-a cross-sectional analysis. Clin Otolaryngol. 2018;43(1):172–81.

    CAS  PubMed  Google Scholar 

  32. Regitz-Zagrosek V, Kararigas G. Mechanistic pathways of sex differences in cardiovascular disease. Physiol Rev. 2017;97(1):1–37.

    PubMed  Google Scholar 

  33. Al-Mana D, Ceranic B, Djahanbakhch O, Luxon LM. Alteration in auditory function during the ovarian cycle. Hear Res. 2010;268(1-2):114–22.

    PubMed  Google Scholar 

  34. Coleman JR, Campbell D, Cooper WA, Welsh MG, Moyer J. Auditory brainstem responses after ovariectomy and estrogen replacement in rat. Hear Res. 1994;80(2):209–15.

    CAS  PubMed  Google Scholar 

  35. Souza DDS, Luckwu B, Andrade WTL, Pessoa LSF, Nascimento JAD, Rosa M. Variation in the hearing threshold in women during the menstrual cycle. Int Arch Otorhinolaryngol. 2017;21(4):323–8.

    PubMed  PubMed Central  Google Scholar 

  36. Buniello A, Ingham NJ, Lewis MA, Huma AC, Martinez-Vega R, Varela-Nieto I, et al. Wbp2 is required for normal glutamatergic synapses in the cochlea and is crucial for hearing. EMBO Mol Med. 2016;8(3):191–207.

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Konig O, Ruttiger L, Muller M, Zimmermann U, Erdmann B, Kalbacher H, et al. Estrogen and the inner ear: megalin knockout mice suffer progressive hearing loss. FASEB J. 2008;22(2):410–7.

    PubMed  Google Scholar 

  38. Nolan LS, Maier H, Hermans-Borgmeyer I, Girotto G, Ecob R, Pirastu N, et al. Estrogen-related receptor gamma and hearing function: evidence of a role in humans and mice. Neurobiol Aging. 2013;34(8):2077 e2071–9.

    Google Scholar 

  39. Carter CO. The inheritance of congenital pyloric stenosis. Br Med Bull. 1961;17:251–4.

    CAS  PubMed  Google Scholar 

  40. Carter CO, Evans KA. Inheritance of congenital pyloric stenosis. J Med Genet. 1969;6(3):233–54.

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Oza AM, DiStefano MT, Hemphill SE, Cushman BJ, Grant AR, Siegert RK, et al. Expert specification of the ACMG/AMP variant interpretation guidelines for genetic hearing loss. Hum Mutat. 2018;39(11):1593–613.

    PubMed  PubMed Central  Google Scholar 

  42. Trpchevska N, Freidin MB, Broer L, Oosterloo BC, Yao S, Zhou Y, et al. Genome-wide association meta-analysis identifies 48 risk variants and highlights the role of the stria vascularis in hearing loss. Am J Hum Genet. 2022;109(6):1077–91.

  43. Ivarsdottir EV, Holm H, Benonisdottir S, Olafsdottir T, Sveinbjornsson G, Thorleifsson G, et al. The genetic architecture of age-related hearing impairment revealed by genome-wide association analysis. Commun Biol. 2021;4(1):706.

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Kelly JJ, Abitbol JM, Hulme S, Press ER, Laird DW, Allman BL. The connexin 30 A88V mutant reduces cochlear gap junction expression and confers long-term protection against hearing loss. J Cell Sci. 2019;132(2);jcs224097.

  45. Akbari P, Gilani A, Sosina O, Kosmicki JA, Khrimian L, Fang YY, et al. Sequencing of 640,000 exomes identifies GPR75 variants associated with protection from obesity. Science. 2021;373(6550).

  46. Montasser ME, Van Hout CV, Miloscio L, Howard AD, Rosenberg A, Callaway M, et al. Genetic and functional evidence links a missense variant in B4GALT1 to lower LDL and fibrinogen. Science. 2021;374(6572):1221–7.

    CAS  PubMed  Google Scholar 

  47. Cherny SS, Livshits G, Wells HRR, Freidin MB, Malkin I, Dawson SJ, et al. Self-reported hearing loss questions provide a good measure for genetic studies: a polygenic risk score analysis from UK Biobank. Eur J Hum Genet. 2020;28(8):1056–65.

    PubMed  PubMed Central  Google Scholar 

  48. Dubno JR, Eckert MA, Lee FS, Matthews LJ, Schmiedt RA. Classifying human audiometric phenotypes of age-related hearing loss from animal models. J Assoc Res Otolaryngol. 2013;14(5):687–701.

    PubMed  PubMed Central  Google Scholar 

  49. Vaden KI Jr, Matthews LJ, Eckert MA, Dubno JR. Longitudinal changes in audiometric phenotypes of age-related hearing loss. J Assoc Res Otolaryngol. 2017;18(2):371–85.

    PubMed  Google Scholar 

  50. Vaden KI, Eckert MA, Matthews LJ, Schmiedt RA, Dubno JR. Metabolic and sensory components of age-related hearing loss. J Assoc Res Otolaryngol. 2022.

  51. Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, et al. Comparison of sociodemographic and health-related characteristics of uk biobank participants with those of the general population. Am J Epidemiol. 2017;186(9):1026–34.

    PubMed  PubMed Central  Google Scholar 

  52. Szustakowski JD, Balasubramanian S, Kvikstad E, Khalid S, Bronson PG, Sasson A, et al. Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank. Nat Genet. 2021;53(7):942–8.

    CAS  PubMed  Google Scholar 

  53. Van Hout CV, Tachmazidou I, Backman JD, Hoffman JD, Liu D, Pandey AK, et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature. 2020;586(7831):749–56.

    PubMed  PubMed Central  Google Scholar 

  54. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The Ensembl variant effect predictor. Genome Biol. 2016;17(1):122.

    PubMed  PubMed Central  Google Scholar 

  55. Smedley D, Schubach M, Jacobsen JOB, Kohler S, Zemojtel T, Spielmann M, et al. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in mendelian disease. Am J Hum Genet. 2016;99(3):595–606.

    CAS  PubMed  PubMed Central  Google Scholar 

  56. Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, et al. Predicting splicing from primary sequence with deep learning. Cell. 2019;176(3):535–548 e524.

    CAS  PubMed  Google Scholar 

  57. Pajusalu S. 5utr. 2021.

  58. Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 2019;47(D1):D886–94.

    CAS  PubMed  Google Scholar 

  59. Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alfoldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581(7809):434–43.

    CAS  PubMed  PubMed Central  Google Scholar 

  60. Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74.

    Google Scholar 

  61. Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature. 2021;590(7845):290–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Fu W, O'Connor TD, Jun G, Kang HM, Abecasis G, Leal SM, et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. 2013;493(7431):216–20.

    CAS  PubMed  Google Scholar 

  63. Maffucci P, Bigio B, Rapaport F, Cobat A, Borghesi A, Lopez M, et al. Blacklisting variants common in private cohorts but not in public databases optimizes human exome analysis. Proc Natl Acad Sci U S A. 2019;116(3):950–9.

    CAS  PubMed  Google Scholar 

  64. Vuckovic D, Mezzavilla M, Cocca M, Morgan A, Brumat M, Catamo E, et al. Whole-genome sequencing reveals new insights into age-related hearing loss: cumulative effects, pleiotropy and the role of selection. Eur J Hum Genet. 2018;26(8):1167–79.

    CAS  PubMed  PubMed Central  Google Scholar 

  65. Curtis D. A weighted burden test using logistic regression for integrated analysis of sequence variants, copy number variants and polygenic risk score. Eur J Hum Genet. 2019;27(1):114–24.

    CAS  PubMed  Google Scholar 

  66. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7.

    PubMed  PubMed Central  Google Scholar 

  67. Birling MC, Yoshiki A, Adams DJ, Ayabe S, Beaudet AL, Bottomley J, et al. A resource of targeted mutant mouse lines for 5,061 genes. Nat Genet. 2021;53(4):416–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  68. Dickinson ME, Flenniken AM, Ji X, Teboul L, Wong MD, White JK, et al. High-throughput discovery of novel developmental phenotypes. Nature. 2016;537(7621):508–14.

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Ingham NJ, Pearson SA, Vancollie VE, Rook V, Lewis MA, Chen J, et al. Mouse screen reveals multiple new genes underlying mouse and human hearing loss. Plos Biol. 2019;17(4):e3000194.

    CAS  PubMed  PubMed Central  Google Scholar 

  70. Ohlemiller KK, Jones SM, Johnson KR. Application of mouse models to research in hearing and balance. J Assoc Res Otolaryngol. 2016;17(6):493–523.

    PubMed  PubMed Central  Google Scholar 

  71. Kitajiri S, Makishima T, Friedman TB, Griffith AJ. A novel mutation at the DFNA36 hearing loss locus reveals a critical function and potential genotype-phenotype correlation for amino acid-572 of TMC1. Clin Genet. 2007;71(2):148–52.

    CAS  PubMed  Google Scholar 

  72. Kurima K, Peters LM, Yang Y, Riazuddin S, Ahmed ZM, Naz S, et al. Dominant and recessive deafness caused by mutations of a novel gene, TMC1, required for cochlear hair-cell function. Nat Genet. 2002;30(3):277–84.

    PubMed  Google Scholar 

  73. Adams DR, Sincan M, Fuentes Fajardo K, Mullikin JC, Pierson TM, Toro C, et al. Analysis of DNA sequence variants detected by high-throughput sequencing. Hum Mutat. 2012;33(4):599–608.

    CAS  PubMed  PubMed Central  Google Scholar 

  74. Fuentes Fajardo KV, Adams D, Program NCS, Mason CE, Sincan M, Tifft C, et al. Detecting false-positive signals in exome sequencing. Hum Mutat. 2012;33(4):609–13.

    CAS  PubMed  Google Scholar 

  75. Field MA, Burgio G, Chuah A, Al Shekaili J, Hassan B, Al Sukaiti N, et al. Recurrent miscalling of missense variation from short-read genome sequence data. BMC Genomics. 2019;20(Suppl 8):546.

    PubMed  PubMed Central  Google Scholar 

  76. Kelly MC, Kolla L, Kelley LA, Morell RJ. Characterization of cochlear development at the single cell level: GEO; 2020. GSE137299. Accessed Dec 2021

    Google Scholar 

  77. Kolla L, Kelly MC, Mann ZF, Anaya-Rocha A, Ellis K, Lemons A, et al. Characterization of the development of the mouse cochlear epithelium at the single cell level. Nat Commun. 2020;11(1):2389.

    PubMed  PubMed Central  Google Scholar 

  78. Ranum PT, Goodwin AT, Yoshimura H, Kolbe DL, Walls WD, Koh JY, et al. Insights into the biology of hearing and deafness revealed by single-cell RNA sequencing. Cell Rep. 2019;26(11):3160–3171 e3163.

    CAS  PubMed  PubMed Central  Google Scholar 

  79. Ranum PT, Smith RJ. Insights into the biology of hearing and deafness revealed by single-cell RNA sequencing: GEO; 2019. GSE114157. Accessed Dec 2021

    Google Scholar 

  80. Xue N, Song L, Song Q, Santos-Sacchi J, Wu H, Navaratnam D. Genes related to SNPs identified by genome-wide association studies of age-related hearing loss show restriction to specific cell types in the adult mouse cochlea. Hear Res. 2021;410:108347.

    PubMed  Google Scholar 

  81. Xue N, Song L, Wu H, Navaratnam D. scRNA-seq of P20 mouse cochlea: GEO; 2021. GSE181454. Accessed Dec 2021.

    Google Scholar 

  82. Liu H, Chen L, Giffen KP, Stringham ST, Li Y, Judge PD, et al. Cell-specific transcriptome analysis shows that adult pillar and Deiters’ cells express genes encoding machinery for specializations of cochlear hair cells. Front Mol Neurosci. 2018;11:356.

    CAS  PubMed  PubMed Central  Google Scholar 

  83. Liu H, Chen L, Giffen KP, Stringham ST, Li Y, Judge PD, et al. RNA-sequencing of adult mouse inner hair cells and outer hair cells of the organ of Corti: GEO; 2018. GSE111348. Accessed Dec 2021

    Google Scholar 

  84. Liu H, Chen L, Giffen KP, Stringham ST, Li Y, Judge PD, et al. RNA-sequencing of adult mouse pillar and Deiters’ supporting cells of the organ of Cort: GEO; 2018. GSE111347. Accessed Dec 2021

    Google Scholar 

  85. Hoa M. Characterizing cellular heterogeneity and homeostatic gene regulatory networks in the adult mammalian stria vascularis using single cell and single nucleus transcrivariable genes showed significant enrichment, in a subset of the listsptional profiling: GEO; 2019. GSE136196. Accessed Dec 2021

    Google Scholar 

  86. Korrapati S, Taukulis I, Olszewski R, Pyle M, Gu S, Singh R, et al. Single cell and single nucleus RNA-Seq reveal cellular heterogeneity and homeostatic regulatory networks in adult mouse stria vascularis. Front Mol Neurosci. 2019;12:316.

    CAS  PubMed  PubMed Central  Google Scholar 

  87. Petitpre C, Wu H, Sharma A, Tokarska A, Fontanet P, Wang Y, et al. Neuronal heterogeneity and stereotyped connectivity in the auditory afferent system. Nat Commun. 2018;9(1):3691.

    PubMed  PubMed Central  Google Scholar 

  88. Petitpre C, Wu H, Sharma A, Tokarska A, Fontanet P, Wang Y, et al. Neuronal heterogeneity and stereotyped connectivity in the auditory afferent system. GEO. 2018; GSE117055. Accessed Dec 2021.

Download references


This research has been conducted using data from UK Biobank, a major biomedical database (, project ID 49593). The authors are grateful to everyone involved in the UK Biobank, especially the participants, without whom there would be no data. We thank Maria Lachgar-Ruiz for helpful comments on the manuscript text.


This study was supported by the National Institutes of Health/National Institute on Deafness and Other Communication Disorders (NIH/NIDCD, grant number P50 DC 000422) and the National Institute for Health Research (NIHR) Biomedical Research Centre, King’s College London.

Author information

Authors and Affiliations



The participant filtering pipeline was devised by MAL, KPS, JRD and BAS. MAL carried out the data processing, outlier analysis, burden analysis and gene expression analysis. MAL and KPS wrote the paper, and JRD and BAS edited and revised it. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Morag A. Lewis.

Ethics declarations

Ethics approval and consent to participate

UK Biobank has ethics approval from the North West Multi-centre Research Ethics Committee (MREC), as a Research Tissue Bank (RTB) approval (number 21/NW/0157). Informed consent was obtained from all participants. Use of the relevant data for this study has been approved by the UK Biobank (ID 49593).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Fig S1.

Population characteristics of chosen participants. Fig S2. Variant types in the outlier gene lists. Fig S3. Outlier and deafness gene heatmap. Fig S4. Participant selection.

Additional file 2: Table S1.

Self-reported ethnicity in the chosen participants. Table S2. Outlier genes with a high variant loads and genes with extreme SLP values from the burden analysis. Table S3. Genes reported to underlie hearing loss in humans and/or mice. Table S4. Highly variable genes. Table S5. Variant classifications in order of impact (high to low). Table S6. GO term analysis. Table S7. ToppGene analysis. Table S8. Expression and classification of clustered genes from the heatmap. Table S9. Protein-coding genes and microRNAs used in this study.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lewis, M.A., Schulte, B.A., Dubno, J.R. et al. Investigating the characteristics of genes and variants associated with self-reported hearing difficulty in older adults in the UK Biobank. BMC Biol 20, 150 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Adult hearing difficulty
  • UK Biobank
  • Exome sequencing
  • Hearing impairment
  • Predicted variant impact