Genomic characterization of the world’s longest selection experiment in mouse reveals the complexity of polygenic traits
BMC Biology volume 20, Article number: 52 (2022)
Long-term selection experiments are a powerful tool to understand the genetic background of complex traits. The longest of such experiments has been conducted in the Research Institute for Farm Animal Biology (FBN), generating extreme mouse lines with increased fertility, body mass, protein mass and endurance. For >140 generations, these lines have been maintained alongside an unselected control line, representing a valuable resource for understanding the genetic basis of polygenic traits. However, their history and genomes have not been reported in a comprehensive manner yet. Therefore, the aim of this study is to provide a summary of the breeding history and phenotypic traits of these lines along with their genomic characteristics. We further attempt to decipher the effects of the observed line-specific patterns of genetic variation on each of the selected traits.
Over the course of >140 generations, selection on the control line has given rise to two extremely fertile lines (>20 pups per litter each), two giant growth lines (one lean, one obese) and one long-distance running line. Whole genome sequencing analysis on 25 animals per line revealed line-specific patterns of genetic variation among lines, as well as high levels of homozygosity within lines. This high degree of distinctiveness results from the combined effects of long-term continuous selection, genetic drift, population bottleneck and isolation. Detection of line-specific patterns of genetic differentiation and structural variation revealed multiple candidate genes behind the improvement of the selected traits.
The genomes of the Dummerstorf trait-selected mouse lines display distinct patterns of genomic variation harbouring multiple trait-relevant genes. Low levels of within-line genetic diversity indicate that many of the beneficial alleles have arrived to fixation alongside with neutral alleles. This study represents the first step in deciphering the influence of selection and neutral evolutionary forces on the genomes of these extreme mouse lines and depicts the genetic complexity underlying polygenic traits.
Artificial selection is the selective breeding of organisms by which desired phenotypic traits evolve in a population . Farm animals are the result of this selective breeding process to achieve efficient food production. However, artificial selection can also be applied experimentally in other species in order to connect genes and other genomic elements to selection response for complex traits such as behaviour  and limb elongation . More generally, experimental evolution, which includes artificial selection experiments, is a powerful approach to understand response to selection across multiple traits and organisms .
The worldwide longest selection experiment on mice began in 1969 at the former Forschungszentrum für Tierproduktion (FZT), nowadays called Research Institute for Farm Animal Biology (FBN) located in Dummerstorf, Germany [5, 6]. Starting from a single founder line developed from four outbred and four inbred mouse strains [5, 6], selection lines for different complex traits were bred with population sizes of 60–100 breeding pairs per line. An unselected control line from the same founder line was maintained over the entire selection period with a larger population size (125–200 breeding pairs) [5, 6]. Over the course of >140 generations, selection has shaped the genomes of the Dummerstorf trait-selected mouse lines, and led to extreme phenotypes that include increased litter size (approx. double the litter size of the unselected mouse line) , body mass (approx. 90g body weight at 6 weeks of age)  and endurance (more than 3× higher untrained running capacity) [9, 10]. Therefore, in order to elucidate the unpredictable polygenic background of these complex traits, where multiple genes, regulatory elements and pathways act in conjunction, the Dummerstorf trait-selected mouse lines represent a valuable resource.
Other selection experiments have generated mice with increased litter size [11,12,13,14], as well as mice with enhanced body weight (see [15, 16] for a list of body weight mouse lines) and exercise performance , yet few studies have examined the polygenic background of these traits through genomic analysis. For example, a genome-wide association study of the high-fertility inbred strain QSi5 corroborated multiple previously reported loci associated with reproductive performance . Likewise, a multi-line approach detected shared loci controlling body weight across seven high body weight selection lines, including an inbred subline of the Dummerstorf’s body mass line . Finally, a comprehensive genomic analysis of mice from the “High Runner” selection experiment found widespread regions with significant genetic differentiation between selected and unselected replicate lines (4 per group) .
The Dummerstorf mouse lines expand the repertoire of polygenic mouse models to understand the genetic basis of fertility, body weight and endurance. Each of these lines arose from almost the same genetic diversity and has been maintained to this day for about half a century. Here we describe the selection history of this unique selection experiment, characterize line-specific patterns of genetic variation and identify genes that are likely associated to each selection trait.
Results and discussion
Phenotypic impact of selection
Over the course of more than 140 generations (Table 1), the selected traits (Table 2) have shown remarkable increments in each line (Table 1, Fig. 1, Additional file 2: Figure S1). The span and number of generations makes the present study the longest selection experiment ever reported in mice. Relative to the unselected control line FZTDU (exposed to genetic drift only), reproductive performance has doubled in DUK (Fertility mouse line 1) and DUC (Fertility mouse line 2) (Fig. 1A,B, F,G, Additional file 2: Figure S1). Even though these two trait-selected lines have achieved comparable litter sizes at first delivery (>20 offspring) , their reproductive lifespan differs, with 5.8 and 2.7 litters in average per lifetime for DUK and DUC, respectively . A remarkable level of divergence has been achieved by the increased body size lines (Fig. 1C,D, Additional file 2: Figure S1). Individuals of the body mass line (DU6) have almost tripled their weight compared to FZTDU (Fig. 1H, Additional file 2: Figure S1), whereas mice of the protein mass line (DU6P) not only have become larger and heavier than FZTDU mice, but their level of muscularity is also considerably higher (Fig. 1D,I, Additional file 2: Figure S1). In terms of running distance capacity, the treadmill performance line (DUhLB) can on average cover three times more distance than FZTDU (Fig. 1J, Additional file 2: Figure S1).
With the exception of the obese line DU6 , each one of the trait-selected mouse lines has developed an extreme phenotype without obvious detrimental effects on their general health, well-being, and longevity. All these lines are still maintained, but selection only continues for DUK, DUC and DU6. Due to the long span of this selection experiment, lines have been given alternative names (Table 1, Additional file 3 [6, 8, 10, 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41]: Table S1) and selected at variable intensities (Additional file 2: Figure S2).
Whole genome sequencing (WGS) analysis and short variant detection
After quality filtering and trimming, >90% of the raw reads were mapped to the genome as pairs, with a mean insert size of ~380 bp. For samples sequenced at a target coverage of 30×, mean genome-wide coverage averaged ~24×, with ~95% of genome territory covered at least 5×; samples sequenced at a target coverage of 5× averaged ~8× and ~72%, respectively (for a summary across all samples see Table 3 and for details, Additional file 4: Data S1).
The final variant call set contained 5,099,945 single-nucleotide polymorphisms (SNPs) and 766,655 insertions-deletions (INDELs) (374,604 insertions; 392,051 deletions, Additional file 2: Figure S3B). The trait-selected lines had much fewer variants than FZTDU and these variants were mostly fixed, whereas FZTDU variants were mostly polymorphic (Fig. 2, Additional file 2: Figure S4, Additional file 3: Table S2). This reduction in genetic diversity could be explained by the fact that the trait-selected lines have been maintained at smaller population sizes and were relocated with fewer founders (Table 1). In fact, it has been shown that artificial selection for complex traits does not affect the number of segregating sites , nor the number of SNP sites and heterozygosity . Interestingly, more than 89% of the variants observed in the trait-selected lines were also detected in the control line FZTDU (Fig. 2A, Additional file 3: Table S2), indicating that despite genetic drift, the control line preserves most of the alleles underlying each selected trait and that it still is a proxy of the original founder population.
Most (~90%) INDELs were no longer than 10 bp (Additional file 2: Figure S3A, Additional File 3: Table S4), with slightly more deletions than inversions (Additional file 2: Figure S3B). The proportion of SNPs and INDELs overlapping dbSNP was 95% and 55%, respectively. This discrepancy is not necessarily due to a high number of artefacts in the INDEL set, but rather by the fact that INDELs are a much less characterized type of genetic variant in comparison .
The number of alleles present in all six lines was ~1M, but very few alleles were shared by the trait-selected lines exclusively (~3.3K) (Fig. 2B). The lines DU6P and DUhLB were the most polymorphic of the trait-selected lines, followed by DU6. The two fertility lines (DUK, DUC) were the least polymorphic ones (Fig. 2B, Additional file 3: Table S2).
Almost all SNPs and INDELs (~97%) occurred in non-coding regions (introns ~56%; intergenic ~41%). This is not an unexpected outcome considering that only ~2% of the genome codes for proteins and genetic variation is widespread. Inter-genic variants could affect regulatory elements of gene expression, as well as transcripts not yet described , whereas intronic variants could affect gene splicing .
Based on assessment of variant annotations, a very small number of variants (20,236 SNPs and 1,801 INDELs) were classified as high-impact and moderate-impact mutations, and could interfere with gene transcription or translation. These “impact variants” were screened for (i) being private for any trait-selected line (Additional file 3: Table S3) and (ii) the functional categories their affected genes belonged to. The number of genes affected by these private “impact variants” was twice as large in DUhLB (1027 genes) than in the other trait-selected lines (465–546 genes). However, there was no obvious coherence between significantly enriched functions and the selected traits (Additional file 4: Data S2).
Runs of homozygosity (RoH) and linkage disequilibrium (LD)
While for the five trait-selected lines, most of the SNP loci (57.5–81.5%) were already fixed for either the reference or the alternative allele, in the control line FZTDU alleles were mostly (>75%) polymorphic (Additional file 2: Figure S5). This disparity was also reflected by the distribution of frequencies for the alternative allele, displaying a “U” shape that is much more pronounced in the trait-selected lines than in the control line (Additional file 2: Figure S6). Genomes of mice from the control line FZTDU also had higher nucleotide diversity (Additional file 2: Figure S7 and S8). Accordingly, RoH covered between ~65 and ~78% (~50% as 1–8 Mb tracts) of the genome length of the trait-selected lines, but only ~45% (~23% as 1–8 Mb tracts) of the genome length of FZTDU (Fig. 3A). Analysing RoH shared among individuals of a population can aid to detect past selection events ; however, this is applicable as long as RoH events are rare in the genome (RoH islands), which is not the case here, where RoH are widespread, indicating that the observed degree of homozygosity is the result of a combination of multiple evolutionary forces.
Linkage disequilibrium decay, represented by the genotype correlation (r2) between pairs of SNP sites within min. 0.1 Mb and max. 5 Mb, can be classified into three patterns with decreasing decay strength; one for the three most homozygous trait-selected lines (DUK, DUC and DU6; upper three lines Fig. 3B), a second for the two least homozygous trait-selected lines (DU6P and DUhLB; middle two lines Fig. 3B) and a third for the unselected line FZTDU (bottom line Fig. 3B). Overall, r2 clearly differs between trait-selected lines and FZTDU. Comparable levels of r2 have been reported in mountain gorillas, in which population decline has led to high levels of inbreeding . Likewise, strong levels of LD have been observed in laboratory mice . However, other populations with high levels of inbreeding, such as dog  and horse breeds , do not display such strong genotypic correlations, highlighting the impact of the bottleneck in the genetic diversity of the Dummerstorf mouse lines.
Population structure of the Dummerstorf mouse lines
The genetic relationship among the 150 Dummerstorf mice was assessed by hierarchical clustering (HC) and admixture analysis using the 5,099,945 SNPs obtained after variant calling. Samples formed a hierarchical group structure that represented each of the Dummerstorf lines (Fig. 4A). There was no admixture present in the trait-selected lines, except for one DUC animal sharing ancestry with mice from DU6P (Fig. 4B). FZTDU is represented as an admixture of all the trait-selected lines with similarly large contributions of the four older lines and a significantly larger contribution of DUhLB (Fig. 4B). This is expected because this mouse line is the youngest and has had the least number of generations that underwent selection.
Genetic differentiation of the trait-selected lines
Mean genome-wide pairwise genetic differentiation among trait-selected lines estimated with the genetic differentiation index (FST) ranged from 0.44 to 0.61 (Fig. 5B). The highest level of differentiation was found between either one of the fertility lines and the body mass line DU6 (FST(DUK-DU6) = 0.61 and FST(DUC-DU6) = 0.59; Fig. 5B), followed by the differentiation between the two fertility lines themselves (FST(DUK-DUC) = 0.57; Fig. 5B). Although pairwise genetic differentiation between trait-selected lines and the control line was similar in all comparisons (FST ~ 0.3), it was lowest in the pairwise comparison between the two most polymorphic lines (FST(DUhLB-FZTDU) = 0.26; Fig. 5B). Such strong levels of differentiation occur mainly as a result of reproductive isolation and genetic drift ; however, it is expected that a subset of alleles that have arrived to fixation due to selection contribute to genetic differentiation as well. The challenge is thus to sort out which genomic regions contain such beneficial alleles.
Trait-specific regions of genetic differentiation
Genome-wide scans were conducted in order to detect genomic regions of consistent genetic differentiation between each trait-selected line and FZTDU. The pseudo-line of DUK and DUC combined (FERT) was also included, for a total of six FST contrasts. Overall, outstanding regions of particularly extreme genetic differentiation were not observed, but rather a uniform genome-wide level of high FST (Fig. 5A). Choosing genomic regions of interest by focusing on the most differentiated regions (95th or 99th percentile of the FST distribution) resulted in the detection of multiple loci in every chromosome (Fig. 5A). Because these regions were frequent and did not sufficiently depart from the global level of genetic differentiation to be considered genomic outliers (i.e. max. zFST: 2.89–3.47, Fig. 5C), a more stringent approach was applied to identify line-specific regions of high genetic differentiation (Fig. 6D and Fig. 7D), while reducing the influence of genetic drift. These regions of distinct genetic differentiation (hereafter referred to as RDDs) appeared simultaneously in the top 5% FST windows of the target contrast and in the bottom 10% of all the remaining contrasts, occurring close to each other in only a subset of chromosomes and containing multiple genes (Fig. 6A–C, Fig. 7A–C, Additional file 4: Data S3-S14), some of which were related to the selected traits (see below).
These thresholds were empirically determined based on a similar study comparing two extremely differentiated inbred maize lines . Neutrality simulations were not conducted due to the lack of genetic material from founders and incomplete pedigrees. This information is critical to identify discrete candidate targets of selection for complex traits, in which selection response occurs gradually and myriads of loci with small effects are expected to be involved .
Line-specific patterns of structural variation (SV)
Despite primarily thought to be deleterious and implicated in disease phenotypes [52, 53], large chromosomal rearrangements such as deletions, duplications and inversions have an important role in local adaptation and divergence of populations . These structural variants can lead to gene expression differences by disrupting genes and altering gene dosage . Because copy number variation often results in notable phenotypic differences, it is likely a subject to selection during domestication . For example, genes related to metabolic activity and production traits have been shown to be affected by copy number variation during artificial selection of cattle , goats  and pigs .
After calling and filtering, only duplications, deletions and inversions remained in the final SV data set. Insertions did not occur in enough samples to be included in the analysis. Also, because of the lower detectability in the low sequencing coverage samples, most SVs were found in high coverage samples (Additional file 3: Table S10). Nevertheless, the final SV call set contained the union of good-quality SVs detected in both coverage sets.
SVs were predominantly located in non-coding regions (98%) where they could affect gene expression. Also, SVs (Table 4) were more abundant in the trait-selected lines (deletions (DEL) 5560–4339; duplications (DUP) 48–20; inversions (INV) 1508–530) than in the control line (DEL 3902; DUP 14, INV 605) implying that large genomic rearrangements could contribute to the development of the selected traits. In order to associate SVs to each selected trait, line-specific SVs overlapping protein-coding genes were identified and characterized in greater detail (Additional file 4: Data S15). The total number of these line-specific SVs ranged from 9 (FZTDU) to 36 (DUC), comprising mostly deletions and inversions (Table 4). Most SVs were polymorphic and large length differences were observed between polymorphic and fixed SVs (Additional file 3: Table S6). Fixed line-specific deletions were detected in all lines, whereas duplications were found only in DU6P, and inversions only in DUC, DU6P and DUhLB (Additional file 3: Table S7).
The number of genes affected by fixed line-specific SVs varied from 1 (DUC, DU6P, FZTDU) to 5 (DUK), but went up to more than a thousand for genes affected by large polymorphic inversions (Additional file 3: Table S8). These genes were classified in functional groups based on the biological processes they are associated with (Additional file 3: Table S9). The most gene-rich functional groups are the ones associated with sensory perception, predominantly olfaction (found in the fertility lines DUK and DUC), followed by “cell cycle and nucleic acid transcription and translation” (in DUC), and “metabolism and energy conversion” (DUC, DU6P).
Genes associated with fertility
Genes detected in RDDs for DUK were enriched for “phospholipase D signalling pathway” (Additional file 3: Table S5). In granulosa cells, phospholipase D activity is stimulated by GnRH, thereby inducing or inhibiting cell differentiation depending on the maturation state of the ovarian follicle . Other genes encode for proteins involved in the ovarian development and maintenance of the primordial follicle reserve (Tsc1 , Trp63 ), in the vascularization of the placenta (Atoh8 ) and facilitate maternal supplied lipids and dietary fat digestion in neonatal mice (Cel [64, 65]). Furthermore, DUK shares a fecundity associated region (Sftpb, Usp39, Tmem150, Rnf181, Vamp5, Vamp8, Cgcx, Mat2a) with Qsi5 mice , an inbred mouse line known for its increased litter size, and candidate genes associated with birth rate and male fertility in humans (Ntm ) and litter size in cattle, goats and pigs (Dio3 [67,68,69]). Interestingly, analysis of private SVs detected a 317-bp deletion affecting Olfr279 (Additional file 4: Data S15). This gene has been associated to mouse male sub-fertility  and more generally, olfactory receptors could regulate fertilization [71, 72].
Significantly enriched terms for DUC included “intracellular steroid hormone receptor signalling pathway” (Additional file 3: Table S5), involving progesterone receptor (Pgr) carrying a missense mutation, which is fixed in and specific for DUC (Additional file 2: Figure S9B). Progesterone is one of the main steroid hormones regulating reproductive processes and critical for (i.a.) pregnancy maintenance and mammary gland development [73, 74]. It remains to be proven if a connection exists between this missense and potentially deleterious (Sorting Intolerant From Tolerant (SIFT) score = 0.04) mutation and the fact that DUC females display increased levels of progesterone . Interestingly, a Neanderthal missense mutation in Pgr associated with increased fertility was recently reported to segregate in human populations . Further candidates in DUC control ovarian follicle development, uterine growth and endometrial angiogenesis during pregnancy (Yap1 , Rxfp1 [77, 78]). In the context of preparation of the endometrium for implantation and pregnancy and progesterone signalling, the gene Rrm2  was identified by the structural variation analysis of the DUC genome.
The fertility lines DUK and DUC have been bred according to the same criteria, share the same evolutionary history, and both have been able to more than double the number of pups per litter since the beginning of selection. Despite these commonalities, improved fertility is achieved via different physiological pathways in each line . For example, females from both fertility lines have an increased ovulation rate, but only DUK exhibits follicles containing multiple oocytes; DUC on the other hand shows an increased progesterone level compared to DUK and FZTDU . The scarce number of RDDs in the combined FERT population also illustrates this discrepancy. Candidate RDD and line-specific SV overlapping genes in both fertility lines likely affect the reproductive process on multiple levels such as ovarian physiology, placentation, sex steroid signalling and milk composition.
Genes associated with body size and endurance
Two of the Dummerstorf trait-selected mouse lines have increased their body weight in response to selection. The “giant” DU6 line (selected for body mass at 6 weeks of age) exhibits an obese phenotype  while the protein-mass line DU6P (selected for protein mass in the carcass) is lean and muscular .
In line with the obese phenotype, DU6 candidate genes overlapping RDDs regulate energy metabolism and food intake (Hcrt ) and are linked to feed efficiency (Wdr27 ) and body composition in other species (Atp11b ). DU6 mice also exhibit larger bones , and the analysis of SVs detected Smad5, a modulator of bone formation , to be partially overlapped by a heterozygous deletion and a heterozygous inversion. Though DU6 gave origin to DUHi, one of the lines used to detect parallel selected regions (PSRs) for high body weight, none of the RDDs intersected with PSRs . This is partly explained by the fact that DUHi was established after sampling DU6 mice on generation 85 (well before bottleneck, see Table 1) and further maintenance of these animals under inbreeding .
Candidate genes in the RDDs for DU6P conform with growth-related major quantitative trait loci found in sheep and are known to influence stature and body size in cattle, pigs and human (Plag1 [83, 84], Chchd7 [83,84,85], Impad1 ). In line with this, an SV (deletion) was found overlapping Fam92a, a gene that is involved in limb development . Further candidates for lean body mass are the RDD overlapping genes Piezo1 (myotube formation [88, 89]) and Cdh13 (control of lipid content in developing adipocytes [90,91,92]).
Finally, genes specific for the endurance line DUhLB participate in lipid metabolism (these animals display faster mobilization of lipids during exercise). Only two DUhLB genes (Aldh3a1 and Aldh3a2, the later containing 3 missense SNPs (Additional file 2: Figure S10C)) caused the significant enrichment of the “Histidine metabolism” and “beta-Alanine metabolism” pathways (Additional file 3: Table S5). The “marathon mice” DUhLB have developed a striking metabolic phenotype characterized by accelerated browning of subcutaneous fat and altered mitochondrial biogenesis in response to selection for high treadmill performance . Likewise, detected RDD candidate genes are involved in the development of brown adipocytes (Srsf6 ), removal of toxic waste products from lipid metabolism (Aldh3a2 ), mobilization of fatty acids, mitochondria content and cristae complexity (Il15r ) and in the regulation of glycolysis associated to obesity and weight gain (Pfkfb3 [96, 97]). Moreover, SV analysis detected a ~2.8 kb inversion in Atp5j whose overexpression has been shown to counteract exercise-induced cardiac hypertrophy in mice . Interestingly, the genes identified here did not overlap with significantly differentiated genes of the “High Runner” selection experiment , highlighting the fact that these two studies produced phenotypically different mice (i.e. DUhLB shows lower running wheel activity compared to controls ).
There are five main weaknesses in this study. First, due to gaps in pedigree documentation over more than 140 generations, modelling neutrality was not feasible. In turn, the thresholds to evaluate line-specific genetic differentiation were chosen empirically by setting conservative limits that minimize the presence of false positives.
Second, at its origin in 1969, the study was not designed to conduct genomic analyses. Thus, genetic material from the founders is not available. Unfortunately, this and the incomplete pedigree information hamper the detection of signatures of selection. However, the genomic data generated here still allows deriving biological interpretations based on the line-specific patterns of genetic differentiation, which is the subject of this study.
Third, relocation of the mouse lines by embryo transfer resulted in a genetic bottleneck and random fixation events. This further obscures insight into the selection response mechanisms of these mouse lines. Still, the current strong phenotypic divergence of the lines is the result of long-term selection.
Fourth, except for the fertility lines DUK and DUC, trait-selected lines were not replicated in order to identify overlapping genomic signatures. Interestingly, these two lines are markedly different both physiologically and genetically, despite having the same selection criteria.
The genomes of the Dummerstorf trait-selected mouse lines have evolved in response to selective breeding and neutral forces, exhibit low genetic diversity and display distinct patterns of genetic variation. Distinguishing between selection and neutral evolution is a challenging task and will require further research. However, by focusing on regions of distinct genetic differentiation, we were able to identify genes with important functions associated to the selected traits.
Over the span of this selection experiment, traits have improved continuously and have not decayed despite the dramatic loss of genetic diversity within lines. This implies that many of the alleles that contribute to trait improvement have arrived to fixation and that these lines are highly enriched for such alleles. Therefore, a deeper understanding of the genomes of the trait-selected Dummerstorf mouse lines will provide valuable insights into the genetic basis of important polygenic traits and constitutes an unprecedented scientific resource for geneticists, physiologists and the wider biomedical research community.
Selection history of the Dummerstorf trait-selected mouse lines
The selection experiment started in 1969 (Tables 1 and 2, for more detail see Additional file 1: Supplementary Methods [5, 6, 22, 101,102,103,104,105,106,107,108,109,110,111,112,113,114]) with the establishment of a founder line FZTDU (Forschungszentrum für Tierproduktion Dummerstorf) [5, 6] by systematic crossing of four outbred strains (NMRI orig., Han:NMRI, CFW, CF1) and four inbred strains (CBA/Bln, AB/Bln, C57BL/Bln, XVII/Bln). From FZTDU, five lines were established through selective breeding: two lines were selected for increased litter size (DUK and DUC), one for increased body mass (DU6), and one each for protein mass (DU6P) and treadmill running endurance (DUhLB) (Table 2, Fig. 1, Additional file 2: Fig. S1).
Sample collection and whole genome sequencing
All animal procedures were performed in accordance with national and international guidelines and approved by the Animal Protection Board of the Institute for Farm Animal Biology. Genomic DNA was purified from tail biopsy samples using QIAamp DNA Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s recommendations. A total of 25 females per line (150 animals in total) were sampled at two different time-points (Table 1). For the first time-point, 10 females per line with the lowest kinship coefficient were chosen. Kinship was determined using the programme INBREED implemented in the software SAS/STAT® (v9.4, SAS Institute Inc., USA). For the second time-point, 15 females per line were chosen at random since the kinship coefficient is similar among subjects of the same line. The study was originally designed with 10 females per line, sequenced at high coverage (target 30×, time-point 1) to capture as much line-specific genetic variability as possible. Due to the low genetic variability in each line resulting from the preliminary data analysis, 15 additional females per line were sequenced. As this was intended to verify the low degree of genetic variability at the initially detected loci and to increase the number of total observations for each line, the samples of the second batch were sequenced with a lower sample coverage (target: 5×, time-point 2).
Library preparation and sequencing were carried out at the Competence Centre for Genomic Analysis (Kiel). Paired-end sequencing libraries were prepared using the TruSeq Nano DNA Library Prep kit following the manufacturer’s specifications (Illumina Inc., San Diego, CA, USA). Out of the 150 libraries, 60 were sequenced on a HiSeq 4000 platform (Illumina Inc.), and 90 samples were sequenced on a NovaSeq 6000 (Illumina Inc.) platform. The target coverage was 30× (high coverage set) and 5× (low coverage set), respectively. Read length was 151 nucleotides. Samples sequenced at 30× (n = 60) were distributed in 9 lanes for a total of 540 pairs of read files. Ten of those samples had to be supplemented with extra sequencing data due to not reaching the expected 30× coverage. Samples sequenced at 5× were not lane-distributed, amounting to 90 pairs of read files. In total, 640 pairs of read files were produced. Sample-wise WGS data is summarized in Additional file 4: Data S1.
Analysis of WGS data
Adapter removal and quality trimming were done using Trimmomatic v0.38  for HisSeq reads and FASTP v0.19.6  for NovaSeq reads. Read quality was evaluated before and after processing with FastQC v0.11.5 . Reads were aligned to the mouse genome build GRCm38.p6 [118, 119] from Ensembl version 93  using the Burrow-Wheeler Aligner software in MEM mode (BWA-MEM)  coupled with SAMtools v1.5  in order to store alignments as Binary Alignment Map (BAM) files. Per sample BAM files were processed sequentially with Picard tools  by adding read group information (AddOrReplaceReadGroups), merging alignments from different read groups (MergeSamFiles), and by sorting (SortSam) and marking duplicated (MarkDuplicates) reads.
Short variant calling and annotation
Short variants were detected according to GATK’s best practices for germline short variant discovery (GATK v 126.96.36.199) [124,125,126,127]. Systematic errors in base quality were corrected using BaseRecalibrator and dbSNP  version 150 for Mus musculus (Ensembl version 93 ). For each sample, variants were called with HaplotypeCaller and then combined with GenomicsDBImport. Joint genotyping was done with GenotypeGVCFs and then only bi-allelic variants (SNPs and INDELs) were retained. Filtering was applied separately for SNPs and INDELs. Site-level filtering was done following the Variant Quality Score Recalibration (VQSR) procedure. This comprised an internal variant set used as truth-training resource, created after stringent site-level filtering of the bi-allelic variants obtained from joint genotype calling, plus an external pre-filtered training variant set provided by the Mouse Genomes Project (MGP version 5 ). Variants were genotyped as missing if the depth of coverage (DP) was either too low (<4), too high (3 standard deviations higher than the sample mean depth) or if the genotype quality (GQ) was too low (<20). INDELs overlapping microsatellites  were excluded. The final set consisted of variants present in at least 15 samples per line (except for DU6 that had a lower coverage, so this threshold was lowered to 12 samples). Annotations were done using SnpEff v4.3t  and missense mutations were further evaluated with Ensembl Variant Effect Predictor (VEP) v.101.0  to obtain their corresponding SIFT scores  and to predict amino acid changes affecting protein function.
Structural variant calling and annotation
Processed BAM files used for short variant calling were also used to detect large structural variants (SVs). SVs correspond to deletions, duplications, insertions, inversions and translocations of at least 50 bp in size . Because of the considerable difference in coverage of the two sequence data sets, this was done independently for the high and the low coverage set.
Three SV callers (Manta v.1.6.0 , Whamg v.1.7.0  and Lumpy v.0.2.13 ) were applied per line and per coverage set yielding six call sets per line (see Additional file 1 for more detailed information). Specific filters were applied depending upon the call set. SVs detected by Manta were site-filtered by excluding SVs with poor mapping quality (Mapping Quality (MAPQ) < 30) or with excessive coverage (>3 × the median chromosome depth) that could be due to reads originated from low complexity regions. For each sample, only SVs with GQ ≥ 20 and read depth ≥5× were accepted. Whamg SV calls with sizes <50 bp and >2 Mb were filtered out to improve call accuracy. Here too, only calls with read depth ≥ 5× were accepted. Calls with GQ < 20 were filtered out. To reduce the number of false positive calls, high cross-chromosomal mappings were excluded, as Whamg is aware of but does not specifically call translocations. Likewise, SVs in poorly mapped regions were also removed. Lumpy SV calls for which supporting evidence (FORMAT/SU field) was below 5 (SU<5) were excluded, as well as SV calls with GQ<20. Since both Whamg and Lumpy do not have a built-in genotyping module, SV call sets were genotyped with Svtyper v0.7.1  prior filtering for genotype quality. For each line and coverage set, SVs called by at least two SV callers were merged using Survivor v.1.0.7  and kept if they were found in at least 10 samples. The final set consisted of the union of SVs detected in the high and low coverage read sets. We then intersected SV calls among all six mouse lines to obtain SVs private for each line (line-specific) and SVs shared among lines. SVs were annotated with Ensembl’s VEP  focusing on variants affecting protein-coding genes with the maximum SV size set to 200 Mb.
Functional classification was conducted after thorough literature and database search (OrthoDB v10 , Uniprot , NCBI Entrez gene ), plus Gene Ontology enrichment analysis (Shiny GO , false discovery rate [FDR] < 0.05). To further minimize false positives, SV calls overlapping gaps and high coverage regions (>80×) in the reference genome assembly were filtered out.
Population genetics analysis
Genetic structure among all 150 samples was assessed using HC analysis and genetic admixture. HC was computed using SNPRelate v1.22.0 . The ape v5.0 package was used for visualization of HC results . Genetic admixture was estimated with ADMIXTURE v1.3.0  after transforming the Variant Calling File (VCF) file into a BED file using PLINK v2.00a2LM [138, 139]. Linkage disequilibrium (LD) was evaluated after thinning the main VCF file with vcftools v0.1.13  retaining sites at least 100 kb apart and then calculating r2 within windows of 5 Mb using PLINK v2.00a2LM [138, 139]. Runs of homozygosity were estimated for each sample using the RoH extension  in SAMtools/BCFtools v1.5 . For this, allele frequencies at each SNP site and a constant recombination rate (average recombination rate mouse genome: 0.51 cM/Mb [142, 143]) were provided. These parameters, plus the genotype likelihoods stored in the VCF containing the sample, allow to identify RoHs using a hidden Markov model.
Genetic differentiation and diversity analysis
The genomes of the trait-selected lines were compared to the neutrally evolving control line (FZTDU). For this, genetic differentiation was estimated using the FST index  in sliding window mode (size = 50 kb, step = 25 kb, min 10 SNPs) using vcftools v0.1.13 . Since FST calculations are based on allele counts and not read counts, differences in depth between low and high coverage samples are not expected to have a direct effect in the estimation of genetic differentiation. The average number of SNP sites per window was ~ 125 (Additional file 3: Table S11). At each window, the arithmetic mean of the SNP-specific FST was calculated and then transformed into z-scores to represent its departure from the genomic mean. Additionally, all samples of the two fertility lines (DUK and DUC) were combined (pseudo-line: FERT) and compared to FZTDU as well. Since autosomes and the X-chromosomes have different effective population sizes, the X-chromosome was standardized individually. In order to identify RDDs, FST windows appearing simultaneously in the 95th percentile of a given contrast and in the bottom 10th percentile of all other contrast were identified. These thresholds are not derived from modelling neutrality, rather they were chosen empirically based on a previous study  and after testing multiple combinations of ≥95th percentiles and ≤10th percentiles, choosing the combination in which RDDs could be found in all contrasts. The upper threshold is suitable to evaluate genetic differentiation [49, 145, 146], while the bottom threshold ensures that there is practically no genetic differentiation between any of the other trait-selected lines and the control line (Fig. 6D and Fig. 7D). Genome-wide diversity patterns were assessed by measuring the nucleotide diversity (π)  in sliding windows of 50 kb size (step size = 25 kb) using vcftools v0.1.13 .
Gene annotation and enrichment analysis
Genes overlapping RDDs were identified using GenomicRanges  and Ensembl 93’s  Mus musculus gene set. In order to sort out the most relevant genes for each of the selected traits, thorough inspection of functional annotations, literature and SNP effects was conducted. This also included testing for enrichment of Gene Ontology Biological Processes (GOBP) [149, 150] and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways [151,152,153] using WebGestalt [154,155,156,157] using the whole genome as reference set. A FDR threshold of 10% was used as cutoff for significant enrichment of a term or pathway. Finally, genes in quantitative trait loci (QTLs) were identified by finding overlaps with QTL data compiled in the Mouse Genome Database [158, 159].
Data handling and visualization
Availability of data and materials
The datasets generated and analysed during the current study are available in the European Nucleotide Archive (raw sequencing data; accession: PRJEB44248 ) and in the European Variation Archive (variant calling files; accession: PRJEB45961 ). Scripts used to generate the results of this publication are available in .
Forschungszentrum für Tierproduktion Dummerstorf
Research Institute for Farm Animal Biology
Fertility mouse line 1
Fertility mouse line 2
Body mass mouse line
Protein mass mouse line
Treadmill performance mouse line
Control mouse line
Whole Genome Sequencing
Runs of homozygosity
- F ST :
Genetic differentiation index
Region of distinct genetic differentiation
Sorting Intolerant From Tolerant
Fertility pseudo-population composed of DUK and DUC
Best linear unbiased prediction
Variant Quality Score Recalibration
Mouse Genomes Project
Binary Alignment Map
Variant Calling File
Ensembl Variant Effect Predictor
False discovery rate
Gene Ontology Biological Processes
Kyoto Encyclopedia of Genes and Genomes
Quantitative trait loci
Conner JK. Artificial Selection. In: Kliman R, editor. Encyclopedia of Evolutionary Biology. Oxford: Academic Press; 2016. p. 107–13.
Kukekova AV, Johnson JL, Xiang X, Feng S, Liu S, Rando HM, et al. Red fox genome assembly identifies genomic regions associated with tame and aggressive behaviours. Nat Ecol Evol. 2018;2:1479–91.
Castro JP, Yancoskie MN, Marchini M, Belohlavy S, Hiramatsu L, Kučka M, et al. An integrative genomic analysis of the Longshanks selection experiment for longer limbs in mice. Elife. 2019;8:e42014.
Boulding EG. Experimental evolution: concepts, methods, and applications of selection experiments. 1st ed. Garland T, Rose MR, editors. Berkeley, CA: University of California Press; 2009.
Schueler L. Mouse strain Fzt:DU and its use as model in animal breeding research. Arch für Tierzucht (Archives Anim Breeding). 1985;28:357–63.
Dietl G, Langhammer M, Renne U. Model simulations for genetic random drift in the outbred strain Fzt: DU. Arch für Tierzucht (Archives Anim Breeding). 2004;47:595–604.
Langhammer M, Michaelis M, Hartmann MF, Wudy SA, Sobczak A, Nürnberg G, et al. Reproductive performance primarily depends on the female genotype in a two-factorial breeding experiment using high-fertility mouse lines. Reproduction. 2017;153:361–8.
Renne U, Langhammer M, Brenmoehl J, Walz C, Zeissler A, Tuchscherer A, et al. Lifelong obesity in a polygenic mouse model prevents age- and diet-induced glucose intolerance– obesity is no road to late-onset diabetes in mice. PLoS One. 2013;8:e79788.
Brenmoehl J, Walz C, Renne U, Ponsuksili S, Wolf C, Langhammer M, et al. Metabolic adaptations in the liver of born long-distance running mice. Med Sci Sport Exerc. 2013;45:841–50.
Ohde D, Moeller M, Brenmoehl J, Walz C, Ponsuksili S, Schwerin M, et al. Advanced running performance by genetic predisposition in male Dummerstorf marathon mice (DUhTP) reveals higher sterol regulatory element-binding protein (SREBP) related mRNA expression in the liver and higher serum levels of progesterone. PLoS One. 2016;11:e0146748.
Holt M, Nicholas FW, James JW, Moran C, Martin ICA. Development of a highly fecund inbred strain of mice. Mamm Genome. 2004;15:951–9.
Bayon Y, Fuente L, Primitivo FS. Selection for increased and decreased total number of young born in the first three parities in mice. Genet Sel Evol. 1988;20:259–66.
Joakimsen Ø, Baker RL. Selection for Litter Size in Mice. Acta Agric Scand. 1977;27:301–18.
Ribeiro EL, van Engelen MA, Nielsen MK. Embryonal survival to 6 days in mice selected on different criteria for litter size. J Anim Sci. 1996;74:610–5.
Bünger L, Laidlaw A, Bulfield G, Eisen EJ, Medrano JF, Bradford GE, et al. Inbred lines of mice derived from long-term growth selected lines: unique resources for mapping growth genes. Mamm Genome. 2001;12:678–86.
Chan YF, Jones FC, McConnell E, Bryk J, Bünger L, Tautz D. Parallel selection mapping using artificially selected mice reveals body weight control loci. Curr Biol. 2012;22:794–800.
Schwartz NL, Patel BA, Garland T, Horner AM. Effects of selective breeding for high voluntary wheel-running behavior on femoral nutrient canal size and abundance in house mice. J Anat. 2018;233:193–203.
Wei J, Ramanathan P, Thomson PC, Martin IC, Moran C, Williamson P. An integrative genomic analysis of the superior fecundity phenotype in QSi5 mice. Mol Biotechnol. 2013;53:217–26.
Hillis DA, Yadgary L, Weinstock GM, Pardo-Manuel de Villena F, Pomp D, Fowler AS, et al. Genetic basis of aerobically supported voluntary exercise: results from a selection experiment with house mice. Genetics. 2020;216:781–804.
Langhammer M, Wytrwat E, Michaelis M, Schön J, Tuchscherer A, Reinsch N, et al. Two mouse lines selected for large litter size display different lifetime fecundities. Reproduction. 2021;161:721–30.
Müller-Eigner A, Sanz-Moreno A, De-Diego I, Venkatasubramani AV, Langhammer M, Gerlini R, et al. Dietary intervention improves health metrics and life expectancy of the genetically obese DU6 (Titan) mouse. bioRxiv. 2021. https://doi.org/10.1101/2020.05.11.088625.
Langhammer M, Michaelis M, Hoeflich A, Sobczak A, Schoen J, Weitzel JM. High-fertility phenotypes: two outbred mouse models exhibit substantially different molecular and physiological strategies warranting improved fertility. Reproduction. 2014;147:427–33.
Michaelis M, Sobczak A, Koczan D, Langhammer M, Reinsch N, Schön J, et al. Testicular transcriptional signatures associated with high fertility. Reproduction. 2018;155:219–31.
Meng J, Mayer M, Wytrwat E, Langhammer M, Reinsch N. Turning observed founder alleles into expected relationships in an intercross population. G3 Genes, Genomes. Genet. 2019;9:889–99.
Bünger L, Renne U, Dietl G, Kuhla S. Long-term selection for protein amount over 70 generations in mice. Genet Res. 1998;72:93–109.
Bünger L, Renne U, Buis RC. Body weight limits in mice - long term selection and single genes. In: Reeve ECR, editor. Chicago: Fitzroy Dearborn; 2001. p. 337–60.
Falkenberg H, Langhammer M, Renne U. Comparison of biochemical blood traits after long-term selection on high or low locomotory activity in mice. Arch Anim Breed. 2000;43:513–22.
Ohde D, Brenmoehl J, Walz C, Tuchscherer A, Wirthgen E, Hoeflich A. Comparative analysis of hepatic miRNA levels in male marathon mice reveals a link between obesity and endurance exercise capacities. J Comp Physiol B Biochem Syst Environ Physiol. 2016;186:1067–78.
Brenmoehl J, Ohde D, Albrecht E, Walz C, Tuchscherer A, Hoeflich A. Browning of subcutaneous fat and higher surface temperature in response to phenotype selection for advanced endurance exercise performance in male DUhTP mice. J Comp Physiol B Biochem Syst Environ Physiol. 2017;187:361–73.
Brenmoehl J, Walz C, Spitschak M, Wirthgen E, Walz M, Langhammer M, et al. Partial phenotype conversion and differential trait response to conditions of husbandry in mice. J Comp Physiol B Biochem Syst Environ Physiol. 2018;188:527–39.
Brenmoehl J, Ohde D, Walz C, Langhammer M, Schultz J, Hoeflich A. Analysis of activity-dependent energy metabolism in mice reveals regulation of mitochondrial fission and fusion mRNA by voluntary physical exercise in subcutaneous fat from male marathon mice (DUhTP). Cells. 2020;9:2697.
Walz C, Brenmoehl J, Trakooljul N, Noce A, Caffier C, Ohde D, et al. Control of protein and energy metabolism in the pituitary gland in response to three-week running training in adult male mice. Cells. 2021;10:736.
Walz M, Chau L, Walz C, Sawitzky M, Ohde D, Brenmoehl J, et al. Overlap of Peak Growth Activity and Peak IGF-1 to IGFBP Ratio: Delayed increase of IGFBPs versus IGF-1 in serum as a mechanism to speed up and down postnatal weight gain in mice. Cells. 2020;9:1516.
Vanselow J, Kucia M, Langhammer M, Koczan D, Rehfeldt C, Metges CC. Hepatic expression of the GH/JAK/STAT/IGF pathway, acute-phase response signalling and complement system are affected in mouse offspring by prenatal and early postnatal exposure to maternal high-protein diet. Eur J Nutr. 2011;50:611–23.
Kucia M, Langhammer M, Goers S, Albrecht E, Hammon HM, Nrnberg G, et al. High-protein diet during gestation and lactation affects mammary gland mRNA abundance, milk composition and pre-weaning litter growth in mice. Animal. 2011;5:268–77.
Vanselow J, Kucia M, Langhammer M, Koczan D, Metges CC. Maternal high-protein diet during pregnancy, but not during suckling, induced altered expression of an increasing number of hepatic genes in adult mouse offspring. Eur J Nutr. 2016;55:917–30.
Schüler L, Renne U, Bünger L. Selection for litter weight on the 21st day after long-term selection for first litter performance in laboratory mice. J Anim Breed Genet. 1990;107:161–8.
Spitschak M, Langhammer M, Schneider F, Renne U, Vanselow J. Two high-fertility mouse lines show differences in component fertility traits after long-term selection. Reprod Fertil Dev. 2007;19:815.
Vanselow J, Nurnberg G, Koczan D, Langhammer M, Thiesen H-JJ, Reinsch N, et al. Expression profiling of a high-fertility mouse line by microarray analysis and qPCR. BMC Genomics. 2008;9:307.
Alm H, Kuhlmann S, Langhammer M, Tuchscherer A, Torner H, Reinsch N. Occurrence of polyovular follicles in mouse lines selected for high fecundity. J Reprod Dev. 2010;56:449–53.
Michaelis M, Langhammer M, Höflich A, Reinsch N, Schön J, Weitzel JM, et al. Initial characterization of an outbreed mouse model for male factor (in)fertility. Andrology. 2013;1:772–8.
Hu J, Ng PC. Predicting the effects of frameshifting indels. Genome Biol. 2012;13:R9.
Bartonicek N, Clark MB, Quek XC, Torpy JR, Pritchard AL, Maag JLV, et al. Intergenic disease-associated regions are abundant in novel transcripts. Genome Biol. 2017;18:241.
Scotti MM, Swanson MS. RNA mis-splicing in disease. Nat Rev Genet. 2016;17:19–32.
Kim E-S, Cole JB, Huson H, Wiggans GR, Van Tassell CP, Crooker BA, et al. Effect of artificial selection on runs of homozygosity in US Holstein cattle. PLoS One. 2013;8:e80813.
Xue Y, Prado-Martinez J, Sudmant PH, Narasimhan V, Ayub Q, Szpak M, et al. Mountain gorilla genomes reveal the impact of long-term population decline and inbreeding. Science (80- ). 2015;348:242–5.
Laurie CC, Nickerson DA, Anderson AD, Weir BS, Livingston RJ, Dean MD, et al. Linkage disequilibrium in wild mice. PLoS Genet. 2007;3:e144.
Davis BW, Williams FJ, Ostrander EA, Parker HG, Plassais J, Kim J, et al. Genetic selection of athletic success in sport-hunting dogs. Proc Natl Acad Sci. 2018;115:E7212–21.
Kim H, Lee T, Park W, Lee JW, Kim J, Lee B-Y, et al. Peeling back the evolutionary layers of molecular mechanisms responsive to exercise-stress in the skeletal muscle of the racing horse. DNA Res. 2013;20:287–98.
Foote AD, Vijay N, Ávila-Arcos MC, Baird RW, Durban JW, Fumagalli M, et al. Genome-culture coevolution promotes rapid divergence of killer whale ecotypes. Nat Commun. 2016;7:11693.
Gage JL, Jarquin D, Romay C, Lorenz A, Buckler ES, Kaeppler S, et al. The effect of artificial selection on phenotypic plasticity in maize. Nat Commun. 2017;8:1348.
Stankiewicz P, Lupski JR. Structural variation in the human genome and its role in disease. Annu Rev Med. 2010;61:437–55.
Weischenfeldt J, Symmons O, Spitz F, Korbel JO. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat Rev Genet. 2013;14:125–38.
Johnson ME, Viggiano L, Bailey JA, Abdul-Rauf M, Goodwin G, Rocchi M, et al. Positive selection of a gene family during the emergence of humans and African apes. Nature. 2001;413:514–9.
Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, et al. Global variation in copy number in the human genome. Nature. 2006;444:444–54.
Paudel Y, Madsen O, Megens HJ, Frantz LAF, Bosse M, Bastiaansen JWM, et al. Evolutionary dynamics of copy number variation in pig genomes in the context of adaptation and domestication. BMC Genomics. 2013;14:449.
Gao Y, Jiang J, Yang S, Hou Y, Liu GE, Zhang S, et al. CNV discovery for milk composition traits in dairy cattle using whole genome resequencing. BMC Genomics. 2017;18:265.
Zhang RQ, Wang JJ, Zhang T, Zhai HL, Shen W. Copy-number variation in goat genome sequence: a comparative analysis of the different litter size trait groups. Gene. 2019;696:40–6.
Chen C, Qiao R, Wei R, Guo Y, Ai H, Ma J, et al. A comprehensive survey of copy number variation in 18 diverse pig populations and identification of candidate copy number variable genes associated with complex traits. BMC Genomics. 2012;13:733.
Amsterdam A, Dantes A, Liscovitch M. Role of phospholipase-D and phosphatidic acid in mediating gonadotropin-releasing hormone-induced inhibition of preantral granulosa cell differentiation. Endocrinology. 1994;135:1205–11.
Adhikari D, Zheng W, Shen Y, Gorre N, Hämäläinen T, Cooney AJ, et al. Tsc/mTORC1 signaling in oocytes governs the quiescence and activation of primordial follicles. Hum Mol Genet. 2009;19:397–410.
Tuppi M, Kehrloesser S, Coutandin DW, Rossi V, Luh LM, Strubel A, et al. Oocyte DNA damage quality control requires consecutive interplay of CHK2 and CK1 to activate p63. Nat Struct Mol Biol. 2018;25:261–9.
Böing M, Brand-Saberi B, Napirei M. Murine transcription factor Math6 is a regulator of placenta development. Sci Rep. 2018;8:14997.
Qiu Y, Sun S, Yu X, Zhou J, Cai W, Qian L. Carboxyl ester lipase is highly conserved in utilizing maternal supplied lipids during early development of zebrafish and human. Biochim Biophys Acta - Mol Cell Biol Lipids. 1865;2020:158663.
Miller R, Lowe ME. Carboxyl ester lipase from either mother’s milk or the pancreas is required for efficient dietary triglyceride digestion in suckling mice. J Nutr. 2008;138:927–30.
Kosova G, Scott NM, Niederberger C, Prins GS, Ober C. Genome-wide association study identifies candidate genes for male fertility traits in humans. Am J Hum Genet. 2012;90:950–61.
Coster A, Madsen O, Heuven HCM, Dibbits B, Groenen MAM, van Arendonk JAM, et al. The imprinted gene DIO3 is a candidate gene for litter size in pigs. PLoS One. 2012;7:e31825.
Magee DA, Berry DP, Berkowicz EW, Sikora KM, Howard DJ, Mullen MP, et al. Single nucleotide polymorphisms within the bovine DLK1-DIO3 imprinted domain are associated with economically important production traits in cattle. J Hered. 2011;102:94–101.
Tao L, He XY, Jiang YT, Lan R, Li M, Li ZM, et al. Combined approaches to reveal genes associated with litter size in Yunshang black goats. Anim Genet. 2020;51:924–34.
Morgan K, Harr B, White MA, Payseur BA, Turner LM. Disrupted gene networks in subfertile hybrid house mice. Mol Biol Evol. 2020;37:1547–62.
Flegel C, Vogel F, Hofreuter A, Schreiner BSP, Osthold S, Veitinger S, et al. Characterization of the Olfactory receptors expressed in human spermatozoa. Front Mol Biosci. 2016;2:73.
Daei-Farshbaf N, Aflatoonian R, Amjadi FS, Taleahmad S, Ashrafi M, Bakhtiyari M. Expression pattern of olfactory receptor genes in human cumulus cells as an indicator for competent oocyte selection. Turkish J Biol. 2020;44:371–80.
Arck P, Hansen PJ, Jericevic BM, Piccinni MP, Szekeres-Bartho J. Progesterone during pregnancy: endocrine-immune cross talk in mammalian species and the role of stress. Am J Reprod Immunol. 2007;58:268–79.
Taraborrelli S. Physiology, production and action of progesterone. Acta Obstet Gynecol Scand. 2015;94:8–16.
Zeberg H, Kelso J, Pääbo S. The Neandertal Progesterone Receptor. Mol Biol Evol. 2020;37:2655–60.
Lv X, He C, Huang C, Wang H, Hua G, Wang Z, et al. Timely expression and activation of YAP1 in granulosa cells is essential for ovarian follicle development. FASEB J. 2019;33:10049–64.
Anand-Ivell R, Ivell R. Regulation of the reproductive cycle and early pregnancy by relaxin family peptides. Mol Cell Endocrinol. 2014;382:472–9.
Lei W, Feng XH, Deng WB, Ni H, Zhang ZR, Jia B, et al. Progesterone and DNA damage encourage uterine cell proliferation and decidualization through up-regulating ribonucleotide reductase 2 expression during early pregnancy in mice. J Biol Chem. 2012;287:15174–92.
Tsuneki H, Wada T, Sasaoka T. Role of orexin in the regulation of glucose homeostasis. Acta Physiol. 2010;198:335–48.
Taussat S, Boussaha M, Ramayo-Caldas Y, Martin P, Venot E, Cantalapiedra-Hijar G, et al. Gene networks for three feed efficiency criteria reveal shared and specific biological processes. Genet Sel Evol. 2020;52:1–14.
Zhang Y, Kent JW, Olivier M, Ali O, Broeckel U, Abdou RM, et al. QTL-based association analyses reveal novel genes influencing pleiotropy of metabolic syndrome (MetS). Obesity. 2013;21:2099–111.
Liu B, Mao N. Smad5: Signaling roles in hematopoiesis and osteogenesis. Int J Biochem Cell Biol. 2004;36:766–70.
Taye M, Yoon J, Dessie T, Cho S, Oh SJ, Lee HK, et al. Deciphering signature of selection affecting beef quality traits in Angus cattle. Genes and Genomics. 2018;40:63–75.
Jiao S, Maltecca C, Gray KA, Cassady JP. Feed intake, average daily gain, feed efficiency, and real-time ultrasound traits in Duroc pigs: II. Genomewide association. J Anim Sci. 2014;92:2846–60.
Xu H, Li H, Wang Z, Abudureyimu A, Yang J, Cao X, et al. A deletion downstream of the CHCHD7 gene is associated with growth traits in sheep. Animals. 2020;10:1–10.
An B, Xia J, Chang T, Wang X, Xu L, Zhang L, et al. Genome-wide association study reveals candidate genes associated with body measurement traits in Chinese Wagyu beef cattle. Anim Genet. 2019;50:386–90.
Schrauwen I, Giese APJ, Aziz A, Lafont DT, Chakchouk I, Santos-Cortez RLP, et al. FAM92A underlies nonsyndromic postaxial polydactyly in humans and an abnormal limb and digit skeletal phenotype in mice. J Bone Miner Res. 2019;34:375–86.
Tsuchiya M, Hara Y, Okuda M, Itoh K, Nishioka R, Shiomi A, et al. Cell surface flip-flop of phosphatidylserine is critical for PIEZO1-mediated myotube formation. Nat Commun. 2018;9:1–15.
Rode B, Shi J, Endesh N, Drinkhill MJ, Webster PJ, Lotteau SJ, et al. Piezo1 channels sense whole body physical activity to reset cardiovascular homeostasis and enhance performance. Nat Commun. 2017;8:1–11.
Göddeke S, Knebel B, Fahlbusch P, Hörbelt T, Poschmann G, Van De Velde F, et al. CDH13 abundance interferes with adipocyte differentiation and is a novel biomarker for adipose tissue health. Int J Obes. 2018;42:1039–50.
Teng MS, Wu S, Hsu LA, Chou HH, Ko YL. Differential associations between CDH13 genotypes, adiponectin levels, and circulating levels of cellular adhesive molecules. Mediators Inflamm. 2015;2015:635751.
Philippova M, Joshi MB, Kyriakakis E, Pfaff D, Erne P, Resink TJ. A guide and guard: the many faces of T-cadherin. Cell Signal. 2009;21:1035–44.
Lin JC, Chi YL, Peng HY, Lu YH. RBM4–Nova1–SRSF6 splicing cascade modulates the development of brown adipocytes. Biochim Biophys Acta. 1859;2016:1368–79.
Keller MA, Zander U, Fuchs JE, Kreutz C, Watschinger K, Mueller T, et al. A gatekeeper helix determines the substrate specificity of Sjögren-Larsson Syndrome enzyme fatty aldehyde dehydrogenase. Nat Commun. 2014;5:1–12.
Loro E, Jang C, Quinn WJ, Baur JA, Arany ZP, Khurana TS. Effect of interleukin-15 receptor alpha ablation on the metabolic responses to moderate exercise simulated by in vivo isometric muscle contractions. Front Physiol. 2019;10:1439.
Jiao H, Kaaman M, Dungner E, Kere J, Arner P, Dahlman I. Association analysis of positional obesity candidate genes based on integrated data from transcriptomics and linkage analysis. Int J Obes. 2008;32:816–25.
Duran J, Navarro-Sabate A, Pujol A, Perales JC, Manzano A, Obach M, et al. Overexpression of ubiquitous 6-phosphofructo-2-kinase in the liver of transgenic mice results in weight gain. Biochem Biophys Res Commun. 2008;365:291–7.
Sagara S, Osanai T, Itoh T, Izumiyama K, Shibutani S, Hanada K, et al. Overexpression of coupling factor 6 attenuates exercise-induced physiological cardiac hypertrophy by inhibiting PI3K/Akt signaling in mice. J Hypertens. 2012;30:778–86.
Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science (80- ). 2021;372:eabf7117.
Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526:75–81.
Chiang C, Layer RM, Faust GG, Lindberg MR, Rose DB, Garrison EP, et al. SpeedSeq: Ultra-fast personal genome analysis and interpretation. Nat Methods. 2015;12:966–8.
Jeffares DC, Jolly C, Hoti M, Speed D, Shaw L, Rallis C, et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat Commun. 2017;8:1–11.
McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17:122.
Kriventseva EV, Kuznetsov D, Tegenfeldt F, Manni M, Dias R, Simão FA, et al. OrthoDB v10: Sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 2019;47:D807–11.
Maglott D, Ostell J, Pruitt KD, Tatusova T. Entrez gene: gene-centered information at NCBI. Nucleic Acids Res. 2011;39:D52–7.
Ge SX, Jung D, Jung D, Yao R. ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics. 2020;36:2628–9.
The UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480–9.
Walsh B, Lynch M. Evolution and selection of quantitative traits. 1st ed. Oxford: Oxford University Press; 2018.
Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975;31:423–47.
Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, Källberg M, et al. Manta: Rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2016;32:1220–2.
Kronenberg ZN, Osborne EJ, Cone KR, Kennedy BJ, Domyan ET, Shapiro MD, et al. Wham: identifying structural variants of biological consequence. PLoS Comput Biol. 2015;11:e1004572.
Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 2014;15:R84.
Kosugi S, Momozawa Y, Liu X, Terao C, Kubo M, Kamatani Y. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol. 2019;20:1–18.
Pirooznia M, Goes F, Zandi PP. Whole-genome CNV analysis: advances in computational approaches. Front Genet. 2015;6:138.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20.
Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–90.
Andrews S. FastQC: a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–62.
Church DM, Goodstadt L, Hillier LW, Zody MC, Goldstein S, She X, et al. Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol. 2009;7:1000112.
Howe KL, Achuthan P, Allen J, Allen J, Alvarez-Jarreta J, Ridwan Amode M, et al. Ensembl 2021. Nucleic Acids Res. 2021;49:D884–91.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–60.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9.
Broad Institute. Picard Toolkit. http://broadinstitute.github.io/picard/.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–303.
Depristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43:491–501.
Auwera GA, Carneiro MO, Hartl C, Poplin R, Angel G, Levy-Moonshine A. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinforma. 2013;43:11.10.1–11.10.33.
Poplin R, Ruano-Rubio V, DePristo MA, Fennell TJ, Carneiro MO, Auwera GA, et al. Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv. 2017. https://doi.org/10.1101/201178.
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–11.
Yates AD, Achuthan P, Akanni W, Allen J, Allen J, Alvarez-Jarreta J, et al. Ensembl 2020. Nucleic Acids Res. 2020;48:D682–8.
Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature. 2011;477:289–94.
Avvaru AK, Sharma D, Verma A, Mishra RK, Sowpati DT. MSDB: a comprehensive, annotated database of microsatellites. Nucleic Acids Res. 2020;48:D155–9.
Cingolani P, Platts A, le Wang L, Coon M, Nguyen T, Wang LLL, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 2012;6:80–92.
Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31:3812–4.
Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nat Rev Genet. 2011;12:363–76.
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28:3326–8.
Paradis E, Schliep K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2018;35:526–8.
Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–64.
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7.
Purcell S, Chang C. PLINK 2. http://www.cog-genomics.org/plink/2.0/. Accessed 2 Mai 2019.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–8.
Narasimhan V, Danecek P, Scally A, Xue Y, Tyler-Smith C, Durbin R. BCFtools/RoH: A hidden Markov model approach for detecting autozygosity from next-generation sequencing data. Bioinformatics. 2016;32:1749–51.
Frayer ME, Payseur BA. Demographic history shapes genomic ancestry in hybrid zones. Ecol Evol. 2021;11:10290–302.
Cox A, Ackert-Bicknell CL, Dumont BL, Yueming D, Bell JT, Brockmann GA, et al. A new standard genetic map for the laboratory mouse. Genetics. 2009;182:1335–44.
Weir BS, Cockerham CC. Estimating F-Statistics for the analysis of population structure. Evolution (N Y). 1984;38:1358–70.
Lai FN, Zhai HL, Cheng M, Ma JY, Cheng SF, Ge W, et al. Whole-genome scanning for the litter size trait associated genes and SNPs under selection in dairy goat (Capra hircus). Sci Rep. 2016;6:1–12.
Wang GD, Zhai W, Yang HC, Fan RX, Cao X, Zhong L, et al. The genomics of selection in dogs and the parallel evolution between dogs and humans. Nat Commun. 2013;4:1860.
Nei M, Li WH. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci U S A. 1979;76:5269–73.
Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, et al. Software for computing and annotating genomic ranges. PLoS Comput Biol. 2013;9:e1003118.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–9.
The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 2019;47:D330.
Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000;28:27–30.
Kanehisa M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 2019;28:1947–51.
Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 2021;49:D545.
Liao Y, Wang J, Jaehnig EJ, Shi Z, Zhang B. WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs. Nucleic Acids Res. 2019;47:W199–205.
Wang J, Vasaikar S, Shi Z, Greer M, Zhang B. WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit. Nucleic Acids Res. 2017;45:W130–7.
Wang J, Duncan D, Shi Z, Zhang B. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res. 2013;41:W77–83.
Zhang B, Kirov S, Snoddy J. WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res. 2005;33:W741–8.
Mouse Genome Informatics. http://www.informatics.jax.org/downloads/reports/mgi_mrk_coord.rpt. Accessed 22 Feb 2021.
Bult CJ, Blake JA, Smith CL, Kadin JA, Richardson JE, Anagnostopoulos A, et al. Mouse Genome Database (MGD) 2019. Nucleic Acids Res. 2019;47:D801–6.
Core R. Team. R: a language and environment for statistical computing http://www.r-project.org/. Vienna, Austria: R Foundation for Statistical. Computing. 2020.
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, et al. Welcome to the tidyverse. J Open Source Softw. 2019;4:1686.
Whole Genome Sequencing outbred mouse lines selected for high fertility, body size and endurance. The European Nucleotide Archive. 2021. http://www.ebi.ac.uk/ena/browser/view/prjeb44248.
Genomic characterization of world’s longest selection experiment in mouse reveals the complexity of polygenic traits. The European Variation Archive. 2021. http://www.ebi.ac.uk/eva/?eva-study=prjeb45961.
WGS analysis of the Dummerstorf mouse lines. GitHub. 2021. http://www.github.com/sosfert/mmu_dummerstorf_wgs.
We particularly thank the staff of the FBN Service Group Lab Animal Facility for excellent animal care and cooperation: Benita Lucht, Karin Ullerich, Sabine Maibohm, Ines Müntzel, Hildburg Meyer. And furthermore we thank Erika Wytrwat for the distinguished data base management with the historical mouse data.
Open Access funding enabled and organized by Projekt DEAL. This study was funded by the Leibniz Collaborative Excellence programme (K52/2017, SOS-FERT). This work was also supported by the DFG Research Infrastructure NGS_CC (project 407495230) as part of the Next Generation Sequencing Competence Network (project 423957469).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Establishment of the Dummerstorf mouse lines, Structural Variant Calling.
Response to selection throughout the selection experiment. Figure S2. Proportion of litters supplying parents for the next generation. Figure S3. Distribution of INDEL lengths. Figure S4. Number of private and shared INDELs among lines. Figure S5. SNP allele frequency state classification. Figure S6. Alternative allele frequency distribution. Figure S7. Nucleotide diversity (π) distribution in the Dummerstorf mouse lines. Figure S8. Example of one chromosome representative of the level of genetic diversity observed in the Dummerstorf mouse lines. Figure S9. Allele frequency heatmap of non-synonymous mutations in RDD genes. Figure S10. Allele frequency heatmap of non-synonymous mutations in RDD genes.
Alternative names of the Dumemrstorf mouse lines. Table S2. Number of SNP and INDEL sites discovered in each line. Table S3. Number of private variants with predicted high/moderate effects according to SnpEff. Table S4. Counts per length up to the 90% most frequent INDELs sorted in decreasing order of frequency. Table S5. Significantly enriched terms based on RDD gene lists. Table S6. Proportion of line-specific fixed and polymorphic structural variants in genic regions. Table S7. Types and lengths of line-specific fixed structural variants in genic regions. Table S8. Number of genes affected by line-specific fixed and polymorphic structural variants. Table S9. Number of genes in functional groups affected by line-specific structural variants. Table S10. Summary of structural variants detected in low and high coverage variant calling sets for each mouse line. Table S11. Number of SNP sites per window analysed with FST.
WGS data overview. Data S2. Significantly enriched terms for genes affected by line-specific SNPs and/or INDELs with high/moderate impact. Data S3. Genomic information of regions of distinct genetic differentiation for DUK. Data S4. Genomic information of regions of distinct genetic differentiation for DUC. Data S5. Genomic information of regions of distinct genetic differentiation for DU6. Data S6. Genomic information of regions of distinct genetic differentiation for DU6P. Data S7. Genomic information of regions of distinct genetic differentiation for DUhLB. Data S8. Genomic information of regions of distinct genetic differentiation for FERT. Data S9. Genes in regions of line-specific genetic differentiation associated to increased fertility (DUK). Data S10. Genes in regions of line-specific genetic differentiation associated to increased fertility (DUC). Data S11. Genes in regions of line-specific genetic differentiation associated to increased body mass (DU6). Data S12. Genes in regions of line-specific genetic differentiation associated to increased lean body mass (DU6P). Data S13. Genes in regions of line-specific genetic differentiation associated to increased treadmill endurance (DUhLB). Data S14. Genes in regions of specific genetic differentiation for the pseudo-line FERT (DUK and DUC combined). Data S15. Gene-spanning line-specific structural variants.
About this article
Cite this article
Palma-Vera, S.E., Reyer, H., Langhammer, M. et al. Genomic characterization of the world’s longest selection experiment in mouse reveals the complexity of polygenic traits. BMC Biol 20, 52 (2022). https://doi.org/10.1186/s12915-022-01248-9