- Research article
- Open Access
Chromosome-level genome assembly and population genomics of Mongolian racerunner (Eremias argus) provide insights into high-altitude adaptation in lizards
BMC Biology volume 21, Article number: 40 (2023)
Although the extreme environmental adaptation of organisms is a hot topic in evolutionary biology, genetic adaptation to high-altitude environment remains poorly characterized in ectothermic animals. Squamates are among the most diverse terrestrial vertebrates, with tremendous ecological plasticity and karyotype diversity, and are a unique model system to investigate the genetic footprints of adaptation.
We report the first chromosome-level assembly of the Mongolian racerunner (Eremias argus) and our comparative genomics analyses found that multiple chromosome fissions/fusions events are unique to lizards. We further sequenced the genomes of 61 Mongolian racerunner individuals that were collected from altitudes ranging from ~ 80 to ~ 2600 m above sea level (m.a.s.l.). Population genomic analyses revealed many novel genomic regions under strong selective sweeps in populations endemic to high altitudes. Genes embedded in those genomic regions are mainly associated with energy metabolism and DNA damage repair pathways. Moreover, we identified and validated two substitutions of PHF14 that may enhance the lizards’ tolerance to hypoxia at high altitudes.
Our study reveals the molecular mechanism of high-altitude adaptation in ectothermic animal using lizard as a research subject and provides a high-quality lizard genomic resource for future research.
An important area in evolutionary biology is understanding the way in which animals have adapted to extreme environment. For example, high altitudes bring multiple challenges, including low oxygen, high ultraviolet radiation, low pressure, and cold temperatures for organisms. However, species that have adapted to such harsh environments usually exhibit numerous physiological capabilities, such as increased heart rates, augmentation of pulmonary ventilation, and increased affinity of hemoglobin for oxygen [1,2,3]. Previous studies have focused on mammals and birds [4,5,6,7,8,9,10,11,12,13,14] and several genes have been identified by genomic studies, such as EPAS1, MTHFR, and EGLN1 in humans [5, 6, 12], EPAS1 in dogs and horses , TRPC1, KCNMA1, and SHEEPV1R18 in pigs , SOCS2 in sheep , and SLC35F1 in Tibetan chickens . These genes are the targets of natural selection in high-altitude environments, however, we have a limited understanding of the way in which ectothermic animals such as amphibians and reptiles respond to high altitudes [15, 16].
Squamates, comprising lizards and snakes, have diverse morphological characters and lifestyles, and therefore, have been widely used as research models in evolution and adaptation studies [17,18,19,20,21]. While physiological ecologists have explained and described the biophysical and biochemical patterns of high-altitude adaptation in squamates, the genetic mechanisms underlying the high-altitude adaptive process are still not widely known [22, 23]. Among squamate species, Mongolian racerunner (Eremias argus) is the lizard that is extensively distributed in Russia, Korea, Mongolia, and China over a wide range of elevations. This species is currently the only known lizard species that is distributed from the Eastern Plain to the Qinghai-Tibet Plateau in China. Such species could serve as a model for the investigation of genetic adaptation to high altitude in lizards. Here, we first report a chromosome-level genome assembly of the Mongolian racerunner generated using a combination of the latest sequencing and scaffold technologies, including single-molecule real-time, linked-read, second-generation sequencing, and Hi-C scaffolding. We also sequenced genomes of 61 Mongolian racerunner individuals collected from altitudes ranging from ~ 80 to ~ 2600 m above sea level (m.a.s.l.). Using these genome resources, we sought to discover the genome variations that respond to high-altitude adaptation in Mongolian racerunners. Our study provides preliminary evidence for extreme environmental adaptation and expands our knowledge of the genome characteristics in lizards.
Genome assembly and chromosomes evolution
We generated a 1.63-Gb genome assembly of the Mongolian racerunner using a sample from the desert in Hohhot, Inner Mongolia Autonomous Region of China. We used a combination of the four latest sequencing technologies: single-molecule real-time sequencing (PacBio Sequel), paired sequencing (Illumina HiSeq), linked-read sequencing (10X Genomics), and Hi-C (Phase Genomics, Inc.). This new genome assembly produced 19 chromosomes with contig N50 of 3.07 Mb and scaffold N50 of 96.73 Mb (Fig. 1a; Additional file 1: Figure S1–4; Additional file 2: Table S1–2). Karyotypic analysis demonstrated that the Mongolian racerunner possesses a ZW/ZZ sex determination system, and the two sex chromosomes differ in size and shape [24, 25]. To identify the potential Z chromosome in our chromosome-scale assembly, we compared the sequence coverage between male and female individuals and determined chromosome 15 as the Z chromosome (Additional file 1: Figure S5).
To verify the completeness of our assembly, we checked the Core Eukaryotic Genes (CEGs) using CEGMA  and Benchmarking Universal Single-Copy Orthologs (BUSCOs) using BUSCO . This analysis recovered 226 (91.13%) complete CEGs and 2441 (94.39%) the complete single-copy orthologs (Additional file 2: Table S3-4), indicating that the Mongolian racerunner genome has a high coverage of the coding region. We also checked the accuracy of the assembly of this genome using Illumina short reads and obtained a mapping rate of 98.65% and a mapping coverage rate of 99.74% (Additional file 2: Table S5), suggesting that the assembled genome has good consistency. Tandem repeat sequences and interspersed repeat sequences account for approximately 47.37% of the Mongolian racerunner genome assembly, and 46.92% of them are transposable elements (Additional file 1: Figure S6; Additional file 2: Table S6). Long interspersed nuclear elements (LINEs) are the most abundant element, making up approximately 33.18% of the genome assembly. Using a comprehensive annotation strategy, 20,107 protein-coding genes were predicted (Additional file 2: Table S7), with an average length of 28.72 kb, an average CDS length of 1.43 kb, an average of 8.63 exons per gene, and an average intron length of 3.57 kb (Additional file 1: Figure S7). Most of the predicted genes (~ 99.68%) were functionally annotated against known protein databases (Additional file 2: Table S8).
In order to characterize genome synteny between the Mongolian racerunner genome and its relatives, pairwise synteny comparisons were conducted between the Mongolian racerunner, rattlesnake, European common wall lizard, and chicken genomes. Multiple chromosome fissions/fusions events were observed in autosomes and sex chromosomes (Additional file 1: Figure S8-9). For autosomes, genes located at chromosome breakpoint regions (e.g., RUNX2, TFAP2B, OSR1, WDPCP, TBX5, and TBX3) of the Mongolian racerunner genome assembly are enriched in “forelimb morphogenesis,” “embryonic forelimb morphogenesis,” “limb morphogenesis,” and “hindlimb morphogenesis” (Fig. 1b). These limb morphogenesis related genes are located on Chr1 in the snake genome, while their orthologs in the lizard genome are located at different chromosomes: i.e., four genes (RUNX2, TFAP2B, OSR1, and WDPCP) are located on Chr3, and two genes (TBX5 and TBX3) were transferred to Chr17 (Fig. 1c). Genes located at chromosome breakpoint regions of the rattlesnake were enriched in “cellular response to organic cyclic compound” (Additional file 1: Figure S10), and four of them (CRISP1, CRISP2, FCN2, and SPINT1) encode proteins that are an important part of rattlesnake venom (Fig. 1d–f). Especially, the FCN2 gene was gained in the snake genome at Chr 15. This gene encodes ficolin which is an essential snake venom component [29, 30]. For sex chromosomes, we found the chicken, snake, and lizard do not share homolog Z chromosome regions (Additional file 1: Figure S11), suggesting that the sex chromosomes of lizards, snakes, and birds might be derived from different autosomal pairs. To explore the origin of Z chromosomes in the Mongolian racerunner and its relatives, we reconstructed 21 ancestral chromosomes of these four species using GRIMM . We found that the chicken ChrZ evolved from one ancestral autosome, lizard ChrZ evolved from two ancestral autosomes, and ChrZ of the snake evolved from another two ancestral autosomes (Fig. 1g). We further checked the location of sex determination-related genes in the lizard genomes, which shows that DMRT1, DMRT2, and DMRT3, the male determining genes in birds , are located on the Chr18 of Mongolian racerunner or the Chr17 of European common wall lizard. SOX3 gene, which is involved in sex determination in mammals , is located on ChrZ of the Mongolian racerunner genome and European common wall lizard (Additional file 1: Figure S11). Together, these results suggest that chromosome fissions/fusions occurred frequently and could have contributed to critical evolutionary events in squamates.
Genome adaptation to high altitude
To explore the molecular adaptation to mid altitude and high altitude in the Mongolian racerunner, we collected and sequenced six Mongolian racerunner populations from low-altitude to high-altitude areas (Fig. 2a, b; Additional file 2: Table S9-11). We utilized a combination of four parameters (FST and θπ, XP-CLR, and Dxy) to identify genomic regions under selective sweeps. We initially used a top 1% FST value and a top 1% θπ ratio cutoff to screen the potential selected region in one mid-altitude population (NMG), and two high-altitude populations (GS and QH) compared to low-altitude group (HB, HEB, and HN), respectively (Additional file 1: Figure S12d-f). We identified a total of 179 (FST ≥ 0.44; θπ ratio ≥ 2.27), 115 (FST ≥ 0.44; θπ ratio ≥ 2.19), and 69 (FST ≥ 0.48; θπ ratio ≥ 6.97) genome segments in the NMG, GS, and QH populations relative to the low-altitude group (combined individuals from HB, HEB, and HN populations). These genome segments contained 78, 64, and 55 genes, respectively (Fig. 2c, d). Of these genes, 76.14% (150 out of 197) were also identified under selective sweep using XP-CLR, and 78.68% (155 out of 197) were identified using Dxy (Additional file 1: Figure S13–15; Additional file 2: Table S12–14). The candidate loci identified by FST and θπ ratio among NMG, GS, and QH contrasts showed a low overlapping rate (Additional file 1: Figure S16), revealing a uniqueness at molecular adaptation across populations, which also has been observed in amphibians and reptiles . Among these outlier genes, TMEM68, a candidate glycerolipid metabolism gene , overlapped in the NMG and GS selective sweep sets, and ABCA8B, a member of the ABCA transporter subfamily and associated with sphingolipid metabolism , overlapped in GS and QH. Genes under selective sweep in lizards endemic to high altitudes (> 2,000 m) are mainly associated with energy metabolism and DNA damage repair pathways (Fig. 2e–h; Additional file 2: Table S15–20). Interestingly, the allele frequency in 13 nonsynonymous mutations from nine genes increased with an increase in altitude, suggesting that these genes are important for the high-altitude adaptation of lizards (Fig. 3a). For example, SYT12, CLEC4M, and CLEC4C regulate calcium ion binding, may be involved in hypoxia-induced regulation [37,38,39]. Given that the QH population is under the highest environmental pressure, we further examined the 55 genes in the FST and θπ outlier regions between QH and the low-altitude populations. Several genes seem to have helped the Mongolian racerunner adapt to the high altitude based on previous in vivo or in vitro functional experiments: e.g., ASB2 is related to angiogenesis , and PITPNM2 and ARHGAP6 are associated with enzymatic activity binding oxygen. Three genes, namely, CAMKK2, CABP5, and MICU2, have important roles in calcium binding and are required for the expression of hypoxia-inducible genes . One gene, ESCO1, ensures the correct replication of DNA in hypoxic environments (Additional file 1: Figure S17) [42, 43].
To investigate how changes in gene expression profiles correlate with changes in altitude of Mongolian racerunner, we performed transcriptomic analysis of lung tissues between low and high populations. The genes with high FST and θπ values and high XP-CLR score included the upregulated P2RX4, FAM13A, and NARF genes, and the downregulated NCOA1, TMEM168, and USP2 genes, which have been suggested to associate with high-altitude adaptation [44,45,46,47]. This observation prompted us to explore other genes with expression changes and that may be under selection in Mongolian racerunners. As a result, we identified 493 downregulated and 845 upregulated differentially expressed genes (DEGs) between HB (the lowest altitude population) and QH (the highest altitude population) (Fig. 3b). The DEGs were functionally enriched in small molecule catabolic and organic hydroxy compound metabolic processes (Additional file 1: Figure S18). Of these genes, OGFOD2, GRIK2, ABCA5, ALL2124, and ADPRH also located in regions with high FST and θπ, and low Tajima’s D (Additional file 1: Figure S19). Although OGFOD2, ALL2124, and ADPRH were significantly upregulated (P < 0.01), GRIK2 and ABCA5 were significantly downregulated (P < 0.01) in the high-altitude population (Fig. 3c). Among these five genes, GRIK2 encodes a cold-sensing receptor , and a reduction in its expression may confer cold resistance in high-altitude Mongolian racerunner populations.
Functional validation of genetic variants in the PHF14 gene
Gene PHF14 has a high FST and log2(θπ), as well as a low Tajima’s D value, indicating strong signals of selection (Fig. 4a). PHF14 is reported to be a novel regulator of mesenchyme growth via platelet-derived growth factor (PDGF) receptor-alpha [49, 50] and a hypoxia-sensitive epigenetic regulator in cell cycle progression . PHF14 is highly expressed in lung cancer, and its high expression is associated with poor survival [51, 52]. Silencing of PHF14 suppresses cancer proliferation [51,52,53,54]. This gene harbored two exonic candidate SNPs in the high-altitude population (QH population). In order to test the potential functional differences of the PHF14 genotypes (I271F and E442D) in high-altitude lizards compared with the lowland species, we assessed the effects of the I271F and E442D mutations in human embryonic kidney cells (HEK293) under hypoxic conditions. Our functional experiments displayed that the wild-type (WT) PHF14, which comes from a low-altitude genotype, and its high-altitude variants exhibited comparable expression, as assessed by western blotting and RT-qPCR (Fig. 4b, c; Additional file 2: Table S21). After culturing the cells for 24 h under hypoxic conditions, the HEK293 cell lines with overexpressed PHF14 variants exhibited lower rates of apoptosis relative to wild-type PHF14, with a double mutant PHF14 (I271F and E442D) showing the lowest rate of apoptosis (P < 0.01) (Fig. 4d). Hypoxia-inducible factor 1 (HIF-1α) and vascular endothelial growth factor (VEGFA) are the master regulators of homeostasis under hypoxic conditions [55, 56]. Western blotting showed that the expression of the HIF-1α and VEGFA proteins was significantly increased after a hypoxic stimulus in WT, I271F, E442D, double mutant (I271F and E442D) PHF14, and EGFP overexpressing HEK293 cells (P < 0.05) (Fig. 4e). We further investigated whether the PHF14 mutant genotypes could alter the transcriptional response to cell apoptosis using RT-qPCR assays. The mRNA expression levels of three out of the nine downstream genes tested, namely, TP53, CDK6, and ATK1, were significantly increased in the double mutant (I271F and E442D) cells (P < 0.01) (Fig. 4f). The mRNA expression levels of CASP7 were significantly decreased in double mutant (I271F and E442D) PHF14 overexpressing HEK293 cells (P < 0.01). No significant changes in mRNA expression levels were observed in single mutants (I271F or E442D). These data suggest that the I271F and E442D may collectively amplify the transcriptional activity mediated by PHF14 in response to apoptosis under hypoxic conditions.
In this study, we investigated the genetic basis of high-altitude adaptation in lizards based on the high-quality reference genome and whole-genome resequencing. We conducted selective sweeping analysis for the populations from different altitudes using a combination method of FST, θπ, XP-CLR, and Dxy and identified candidate genomic regions with exceptionally high levels of population differentiation on a genome scale. A total of 197 genes were found, showing strong evidence of selection in mid- and high-altitude populations. The PHF14 gene, displaying reduced genetic diversity and Tajima’s D in a high-altitude population, was identified and validated as a target gene of natural selection for hypoxic adaptation in lizards. However, it is worth noting that, due to the limitations of the sample size and the genome scanning methods used in the present study, both additional samples and computational methods are required to evidence high-altitude adaptation in lizards. Furthermore, since mutations of the PHF14 gene are partially validated, further investigation into genes located in genomic regions under selective sweeps would provide greater insight into mechanisms of adaptation.
We sequenced and assembled a high-quality genome of the Mongolian racerunner, the lizard distributed from the central and eastern plain to the Qinghai–Tibet Plateau, to understand how lizards adapt to high-altitude environments. Using this genome as reference, we found that chromosome fissions/fusions might be related to limb degeneration, snake venom, and sex chromosome evolution. Subsequent population genomics of Mongolian racerunners from high and low-altitude populations revealed a variety of novel genes associated with local adaptations to high altitude. Taken together, our study provides a high-quality genome assembly of the lizard and furthers our understanding of the genetic basis of high-altitude adaptation using reptiles as a model system.
Samples collecting and preparing
For de novo sequencing and assembly, the muscle tissue of an adult female Mongolian racerunner, which was collected from the desert in Hohhot, Inner Mongolia Autonomous Region of China, was harvested for DNA extraction. For genome annotation, RNA libraries were constructed with six tissues (heart, kidney, gonad, lung, liver, and skin) from the same female adult lizard. A total of additional 61 Mongolian racerunner individuals were collected for whole genome resequencing, containing 10 individuals from Harbin of Heilongjiang province, 9 individuals from Xingtai of Hebei province, 10 individuals from Gongyi, Zhengzhou of Henan province, 11 individuals from Hohhot of Inner Mongolia Autonomous Region, 11 individuals from Jingtai, Baiyin of Gansu province, and 10 individuals from Gonghe, Tibetan Autonomous Prefecture of Hainan of Qinghai province. All the lizards used in this study were euthanized by CO2 and tissue samples were first flash frozen in liquid nitrogen and then stored at − 80℃ until use. Total genomic DNA was extracted from muscle tissue by using the stand Phenol–chloroform method.
Pacific Biosciences (PacBio) library
Genomic DNA was isolated using a Qiagen DNA Genomic kit (Qiagen, Valencia, CA, USA) and the integrity of DNA was checked using agarose gel electrophoresis. DNA purity was checked using the NanoPhotometer® spectrophotometer (IMPLEN, CA, USA). The qualified DNA samples were randomly broken into fragments by g-TUBE (Covaris) and large fragments (≥ 20 kb) of DNA were enriched and purified by magnetic beads. Fragments of DNA were repaired by damage repair and end repair. The stem ring-like adaptors were added at the two ends of DNA fragments, and the fragments that failed to be repaired were removed by exonuclease. Approximately 40-kb SMRTbell libraries were prepared according to the released protocol from PacBio. The single-molecule real-time (SMRT) sequencing was carried out using P6-C4 chemistry on the PacBio Sequel platform at Beijing Novogene Co. Ltd (Beijing, China). An approximate 115.3-fold genome coverage (187.94 Gb) long reads were generated for the assembly.
Illumina sequencing and quality control
A total of 1.5 µg DNA per sample was used for the DNA sample preparations. Sequencing libraries were generated using Truseq Nano DNA HT Sample preparation Kit (Illumina USA) following the manufacturer’s recommendations and index codes were added to attribute sequences to each sample. Briefly, the DNA sample was fragmented by sonication to a size of 350 bp, then DNA fragments were end polished, A-tailed, and ligated with the full-length adapter for Illumina sequencing with further PCR amplification. At last, PCR products were purified (AMPure XP system) and libraries were analyzed for size distribution by Agilent2100 Bioanalyzer and quantified using real-time PCR.
For Illumina HiSeq sequencing, paired-end libraries were constructed with insert sizes around 350 bp according to Illumina’s protocol. An approximate 65.30-fold genome coverage (106.44 GB) of Illumina paired-end reads was generated for the assembly.
For population genetic analysis, we generated 1399.362 Gb Illumina reads (~ 12.47-average fold per individual). All of the Illumina reads were controlled for quality and sequenced on an Illumina Hiseq sequencer at Beijing Novogene Co. Ltd (Beijing, China). We removed low-quality paired reads as follows: Reads with ≥ 10% unidentified nucleotides (N); > 10 nt aligned to the adaptor, allowing ≤ 10% mismatches; > 50% bases having phred quality < 5; putative PCR duplicates generated in the library construction process.
10X Genomics library
The Illumina Read1 sequencing primer, 16 bp barcode, and 6 bp random sequence primer were connected to the GelBeads sequence. The GelBeads passes through two injection ports. The input DNA is mixed with enzymes and dNTP and other reactants after entering the first sample inlet, and the mixture is wrapped by oil drops when passing through the second sample inlet, then it is collected and placed on a special 96-well plate for the barcode library preparation. The random primers were combined with the random position of genomic DNA fragments in the oil drop reaction system for constant temperature PCR amplification. After the library preparation by adding the Illumina P5 and P7 adaptors into the amplification products, PE150 was sequenced on the Illumina platform at Beijing Novogene Co. Ltd (Beijing, China). An approximate 119.10-fold genome coverage (194.14 Gb) 10X Genomics Linked-Reads were generated for the assembly.
First, chromatin was digested for 16 h with 400 U HindIII restriction enzyme (NEB) at 37 °C. DNA ends were labeled with biotin and incubated at 37 °C for 45 min, and the enzyme was inactivated with 20% SDS solution. DNA ligation was then performed by the addition of T4 DNA ligase (NEB) and incubation at 16 °C for 4 ~ 6 h. After ligation, proteinase K was added to reverse cross-linking during incubation at 65 °C overnight. DNA fragments were purified and dissolved in 86μL of water. Unligated ends were then removed. Purified DNA was next fragmented to a size of 300–500 bp, and DNA ends were then repaired. DNA fragments labeled by biotin were finally separated on Dynabeads® M-280 Streptavidin (Life Technologies). Hi-C libraries were finally controlled for quality and sequenced on an Illumina Hiseq X Ten sequencer. An approximate 143.25-fold genome coverage (233.50 Gb) Hi-C reads were generated for the assembly.
As for the assembly of the genome, the longest PacBio reads were selected as the seed reads, Daligner  was used to map all of the PacBio long reads to the seed reads. The pre-assembled reads were obtained by generating a consensus of mapped reads using LASort, LAMerge, and pbdacgon. Unitigs of the pre-assembled reads were used to generate a layout of overlapping reads and the contigs of assembly. The high-quality consensus was obtained after mapping the PacBio long reads to the de novo assembled reference by Quiver  (algorithm: arrow). Duplicated assembled haploid contigs were purged using purge_haplotigs  with short-insert Illumina reads.
The Linked-reads from the 10X Genomics platform were used to construct a scaffold genome by fragScaff . Illumina reads were used to correct the inaccurate bases in the genome introduced by fragScaff using Pilon .
We used Hi-C reads to obtain a chromosome-level genome by Lachesis , though fragScaff had improved assemblies and produced a scaffold genome. The frequency of long-range interactions decreases rapidly as the linear distance along a chromosome increases, which helps produce a number of high-quality and contiguous assemblies. The final assembly totaled 1.63 Gb of sequence with a contig N50 of 3.07 Mb and a scaffold N50 of 96.73 Mb.
To annotate the repeat sequences of the genome, we initially used LTR_FINDER (version 1.0.7) , RepeatModeler (version 2.1), and RepeatScount (version 1.0.5) to find repeats. Next, RepeatMasker (version 4.0.5)  was used to search for known and novel transposable elements (TEs) by mapping sequences against the de novo repeat library and Repbase TE library (v20140131). In addition, we used the ProteinMask software (version open-2.1, with parameters: -no LowSimple -p value 0.0001) to identify TE-relevant proteins. Overlapping transposable elements belonging to the same type of repeats were integrated together.
We used a homology-based method to annotate the protein-coding gene structure. First, we download genome and annotation information from online database and built a non-redundant protein database of 8 species: Anolis carolinensis (ftp://ftp.ensembl.org/pub/release-100/fasta/anolis_carolinensis/dna/Anolis_carolinensis.AnoCar2.0.dna.toplevel.fa.gz), Pogona vitticeps (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/900/067/755/GCF_900067755.1_pvi1.1/GCF_900067755.1_pvi1.1_genomic.fna.gz), Shinisaurus crocodilurus (ftp://parrot.genomics.cn/gigadb/pub/10.5524/100001_101000/100315/Shinisaurus_crocodilurus.fa.gz), Gekko japonicus (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/447/785/GCF_001447785.1_Gekko_japonicus_V1.1/GCF_001447785.1_Gekko_japonicus_V1.1_genomic.fna.gz), Python molurus bivittatus (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/186/305/GCF_000186305.1_Python_molurus_bivittatus-5.0.2/GCF_000186305.1_Python_molurus_bivittatus-5.0.2_genomic.fna.gz), Protobothrops mucrosquamatus (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/527/695/GCF_001527695.2_P.Mucros_1.0/GCF_001527695.2_P.Mucros_1.0_genomic.fna.gz), Thamnophis sirtalis (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/077/635/GCF_001077635.1_Thamnophis_sirtalis-6.0/GCF_001077635.1_Thamnophis_sirtalis-6.0_genomic.fna.gz), and Deinagkistrodon acutus (ftp://parrot.genomics.cn/gigadb/pub/10.5524/100001_101000/100196/Deinagkistrodon_acutus.fa.gz). Then the protein sequences were aligned to the genome by using TBlastN  with an E-value cutoff by 1E-5. The blast hits were conjoined by solar . For each blast hit, Genewise  was used to predict the exact gene structure in the corresponding genomic regions.
For de novo gene prediction, we utilized SNAP (version 2006–07-28) , GENSCAN (version 1.0) , GlimmerHMM (version 3.0.4) , and AUGUSTUS (version 3.1)  to analyze the repeat-masked genome. EVidenceModeler software (EVM, version 1.1.1)  was used to integrate the genes predicted by the homology and de novo approaches and generate a consensus gene set.
RNA-seq data from 6 issues as mentioned above were mapped to the genome using Tophat (version 2.0.13) . Then cufflinks (version 2.1.1)  was used to assemble transcripts to gene models. Transcripts were assembled to expressed sequence tags (ESTs) by Trinity  with parameters (-ss 0.5 -jc 0 -minkmercov 2 -minglue 2). We used the transcript information to revise the gene set.
Functional annotation of protein-coding genes was evaluated by BLASTP (e value 1E − 05) using two integrated protein sequence databases — SwissProt  and NR database. Protein domains were annotated by searching InterPro and Pfam databases, using InterProScan and Hmmer, respectively. Gene Ontology (GO)  terms for each gene were obtained from the corresponding InterPro or Pfam entry. The pathways, in which the gene might be involved, were assigned by blast against the KEGG database (release53) . The tRNA genes were identified by tRNAscan-SE  software. The rRNA fragments were predicted by aligning to the rRNA sequences database using BlastN at E-value of 1E − 10. The miRNA and snRNA genes were predicted by INFERNAL software against the Rfam database .
Phylogenetic tree construction
Gene families were constructed through a hierarchical clustering algorithm and ‘all against all’ BLASTP with a cutoff of 1E − 9 (blast-2.2.26). The alignments with high-scoring segment pairs were conjoined for each gene pair by solar. To identify homologous gene pairs, more than 30% coverage of the aligned regions in both homologous genes was required. Finally, homologous genes were clustered into gene families using the software hcluster_sg.
The phylogenetic tree was reconstructed by using shared single-copy genes. Protein sequences for these single-copy genes were aligned by MUSCLE (version 3.7) , then protein sequence alignment was transformed back to CDS alignments. We concatenated the CDS alignments of single-copy genes to a “supermatrix.” Using this supermatrix, we constructed the phylogenetic tree using the ML (maximum likelihood) algorithm as implemented in RAxML software (version 8.0.19) .
We used MCscan (version v0.8.12)  to get synteny blocks among five chromosome-level genomes: Mongolian racerunner, European common wall lizard (Podarcis muralis), chicken, green anole lizard, and rattlesnake (Crotalus viridis) genomes. We converted the gff files to BED format files, then the CDS sequences were extracted from five genomes and translated into protein sequences as input files. The parameters used for the genome alignment were “–minspan = 10.” The alignments of syntenic chromosomes were visualized by MCscan. The parameters used for visualizing were “–notex.” The chromosome breakpoint regions are defined as the interval between two syntenic blocks from different chromosomes with more than 10 genes in size, which were detected between the Mongolian racerunner and rattlesnake genomes. Ancestral genome reconstruction was performed using GRIMM-Synteny  and MGR .
SNP calling and filtering
The high-quality paired-end reads were mapped to the reference genome using BWA (Burrows-Wheeler Aligner) (Version: 0.7.8)  with the command “mem -t 4 -k 32 –M.” After alignment, we performed SNP calling on a population scale using a Bayesian approach as implemented in the package SAMtools . We then calculated genotype likelihoods from reads for each individual at each genomic location and the allele frequencies in the sample with a Bayesian approach. The “mpileup” command was used to identify SNPs with the parameters as “-q 1 -C 50 -t SP -t DP -m 2 -F 0.002.” Then, to exclude SNP calling errors caused by incorrect mapping or InDels, only high-quality SNPs (coverage depth ≥ 6 and ≤ 100, RMS mapping quality ≥ 20, maf ≥ 0.1, miss ≤ 0.1) were kept for subsequent analysis. Consequently, 6,303,245 SNPs were left after filtering from 78,513,343 raw SNPs (Additional file 2: Table S11-12). SNP annotation was performed according to the Mongolian racerunner genome using the package ANNOVAR (Version: 2013–05-20) .
Selective sweep and enrichment analysis
We calculated genome-wide distribution of FST values and θπ using Vcftools (version 0.1.14)  in 40-kb windows with a step size of 20 kb. The θπ ratios (θπ–group1/θπ–group2) were calculated for three extreme-control group pairs and log2-transformed. The windows with top 1% FST and -log2(θπ ratio), simultaneously were considered as candidate outliers under strong selective sweeps. We also used XP-CLR  as a supplement to verify the reliability of the above two selective sweep methods for this project. All outlier regions were assigned to corresponding SNPs and annotated genes. Except above relative divergence methods, we used the absolute divergence (Dxy) approach to further validate higher divergence regions.
For genes annotated in outlier regions, we used KOBA2.0  software to test the statistical enrichment in KEGG pathways. Gene Ontology (GO) enrichment analysis was implemented by the GOseq R package, in which gene length bias was corrected. GO terms with corrected P-value less than 0.05 were considered significant.
The divergent levels (FST) of between two subpopulations were simulated by the coalescent simulator ms . We performed the simulations under three isolation with migration (IWM) models. The between-population split time (-ej parameter) and migration rate (-I parameter) were adjusted to match the overall observed FST distributions. The FST values from simulations were calculated by the evo program (https://github.com/millanek/evo) with “fst –ms” parameter . The result showed that the top 1% of observed FST (approximately FST ≥ 0.42) are always higher than the corresponding neutral FST values, which is consistent with divergent selection (FST) acting on approximately this top 1% of selective sweeping regions.
Transcriptome sequencing and analysis
For the gene profile of lizard at high altitude, the lungs of four individuals from Qinghai (highest altitude) and four individuals from Hebei (lowest altitude) were harvested locally and preserved in RNA later immediately. A total of 3 μg RNA per sample was used as input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext® UltraTM RNA Library Prep Kit for Illumina® (NEB, USA) following the manufacturer’s recommendations and index codes were added to attribute sequences to each sample. We generated 40.13 Gb RNA-seq reads of six different organ tissues for genome annotation and 55.02 Gb RNA-seq reads of eight lung tissues for transcriptomic analysis of high-altitude adaptation. Index of the reference genome was built using HISAT2-BUILD and the high-quality RNA-seq reads were aligned to the reference genome by the HISAT2 v2.0.4  program with default parameters. HTSeq v0.6.1  was used to count the reads numbers mapped to each gene. And then FPKM (Fragments Per Kilobase of transcript per Million mapped reads) of each gene was calculated based on the length of the gene and reads count numbers. FPKM considers the effect of sequencing depth and gene length for the reads count at the same time, and is currently the most commonly used method for estimating gene expression levels. Differential expression analysis of two conditions (four biological replicates per condition) was performed by using the edgeR .
Cell maintenance and plasmid construct
HEK293 cells were cultivated in DMEM (C11995500BT; Gibco) with 10% FBS (10099141; Gibco) and penicillin (0.2 U/ml)/streptomycin (0.2 lg/ml). The wild-type PHF14 gene was synthesized in Sangon Company, China. The mutagenesis of the PHF14 sequence (I271F and E442D) was achieved through site-directed mutagenesis and verified by Sanger sequencing. Rapid Recombinant Clone Kit (VI201, TIANGEN) was used to subclone the wildtype PHF14 cDNA, the SNP1 I271F sequence (A to T), or the SNP2 E442D sequence (A to T) into the pEGFP-N1 plasmid for the overexpression analysis.
Transfection, Western blotting, and qPCR quantification
Transfection of the recombinant plasmids into HEK293 cells was carried out by using Lipofectamine3000 (Invitrogen, America) according to the manufacturer’s instructions. First, 4 μL of Lipofectamine3000 Reagent (Invitrogen, America) and 5 μg of Endofree plasmid were diluted in 125 μL of DMEM (C11995500BT; Gibco). This solution was then mixed with Lipofectamine plasmids Reagent (1:1 ratio) and incubated for 15 min at room temperature, they were then added to 6-well plates (Costar, USA) with 50% cell confluence. These cells were incubated at 37℃ with 5% CO2 for 24 h.
Cells were hypoxic stimulus with 1% O2 for 24 h after 24 h after transfection and then were washed with cold PBS, harvested, and immediately lysed (RIPA Lysis Buffer, P0013B; Beyotime and PMSF, ST506; Beyotime) for 30 min on ice, cell lysates were centrifuged at 14,000 g for 20 min at 4℃, and supernatants were collected. The total amount of proteins was measured with the BCA method. 10ug of cell lysates supernatants were subjected to western blot with anti-GFP (AE012; Abclonal), anti-HIFα (a16873; ABclonal) anti-VEGFA (ab1316; Abcam), and anti-Tubulin (AC008; Abclonal) antibodies. Antigen–antibody complexes were visualized by enhanced chemiluminescence detection (mageQuantTM LAS 4000).
After the cells were hypoxic stimulus with 1% O2 for 24 h, the cells were washed in PBS, harvested, and total RNA was extracted using TransZol Up Plus RNA Kit (ER501-01, TRANS) and RNA was reverse transcribed with GOScriptTM Reverse Transcription System (A5001, Promega) according to the manufacturer’s instructions. Real-time qPCR was performed using SYBR™ Universal PCR Master Mix (439,155; Invitrogen) with first-strand cDNA to evaluate gene expression. Quantitative Real-Time PCR analysis of the TP53, CDK6, AKT1, and CASP7 was carried out using the ABI7500 sequence detection system (Applied Biosystems by Life Technologies, Darmstadt, Germany). The ubiquitous Tubulin gene served as a reference control. PCR primers are shown in Additional file 2: Table S21 and qPCR conditions were as follows: 50 °C for 2 min, initial denaturation at 95 °C for 2 min, 40 cycles of denaturation at 95 °C for 15 s, with combined annealing and extension at 58 °C for 1 min. Three biological replicates were performed per sample.
Cell apoptosis assays
The HEK293 cells overexpressed WT, I271F, E442D, and I271F, E442D, or the PEGFP-N1 were treated on hypoxic conditions for 24 h. The apoptosis rate was evaluated using the Annexin V-AF647/PI Apoptosis Detection kit (P-CA-203, Procell) according to the instructions from the manufacturer. The cells were seeded into 6-well plates (3 × 105 cells/well). Following hypoxic conditions, the cells were collected, washed with cold PBS, and resuspended in 500 μL binding buffer. Then, 5 μL Annexin V-AF647 and 5 μL PI were added to the buffer and incubated at room temperature for 15 min. Cells were analyzed by flow cytometry (BD FACSCanto) within 1 h.
Availability of data and materials
The raw sequencing data used for genome assembly and whole genome resequencing analysis have been deposited at the public NCBI’s BioProject under the accession number PRJNA659114 (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA659114) . The genome assembly and annotation data have been deposited at the Figshare database (https://doi.org/10.6084/m9.figshare.21098470) .
Okuzaki Y, Sota T. Factors related to altitudinal body size variation in the earthworm-eating ground beetle carabus japonicus. Zoolog Sci. 2017;34(3):229–34.
Tufts DM, Revsbech IG, Cheviron ZA, Weber RE, Fago A, Storz JF. Phenotypic plasticity in blood-oxygen transport in highland and lowland deer mice. J Exp Biol. 2013;216(Pt 7):1167–73.
Naeije R. Physiological adaptation of the cardiovascular system to high altitude. Prog Cardiovasc Dis. 2010;52(6):456–66.
Witt KE, Huerta-Sanchez E. Convergent evolution in human and domesticate adaptation to high-altitude environments. Philos Trans Royal Soc B Biol Sci. 2019;374(1777):20180235.
Peng Y, Yang Z, Zhang H, Cui C, Qi X, Luo X, Tao X, Wu T, Ouzhuluobu, Basang, et al. Genetic variations in Tibetan populations and high-altitude adaptation at the Himalayas. Mol Biol Evol. 2011;28(2):1075–81.
Xu S, Li S, Yang Y, Tan J, Lou H, Jin W, Yang L, Pan X, Wang J, Shen Y, et al. A genome-wide search for signals of high-altitude adaptation in Tibetans. Mol Biol Evol. 2011;28(2):1003–11.
Hendrickson SL. A genome wide study of genetic adaptation to high altitude in feral Andean Horses of the paramo. BMC Evol Biol. 2013;13:273.
Li M, Tian S, Jin L, Zhou G, Li Y, Zhang Y, Wang T, Yeung CK, Chen L, Ma J, et al. Genomic analyses identify distinct patterns of selection in domesticated pigs and Tibetan wild boars. Nat Genet. 2013;45(12):1431–8.
Gou X, Wang Z, Li N, Qiu F, Xu Z, Yan D, Yang S, Jia J, Kong X, Wei Z, et al. Whole-genome sequencing of six dog breeds from continuous altitudes reveals adaptation to high-altitude hypoxia. Genome Res. 2014;24(8):1308–15.
Wang MS, Li Y, Peng MS, Zhong L, Wang ZJ, Li QY, Tu XL, Dong Y, Zhu CL, Wang L, et al. Genomic analyses reveal potential independent adaptation to high altitude in Tibetan chickens. Mol Biol Evol. 2015;32(7):1880–9.
Yang J, Li WR, Lv FH, He SG, Tian SL, Peng WF, Sun YW, Zhao YX, Tu XL, Zhang M, et al. Whole-genome sequencing of native sheep provides insights into rapid adaptations to extreme environments. Mol Biol Evol. 2016;33(10):2576–92.
Yang J, Jin ZB, Chen J, Huang XF, Li XM, Liang YB, Mao JY, Chen X, Zheng Z, Bakshi A, et al. Genetic signatures of high-altitude adaptation in Tibetans. Proc Natl Acad Sci. 2017;114(16):4189–94.
Lai YT, Yeung CKL, Omland KE, Pang EL, Hao Y, Liao BY, Cao HF, Zhang BW, Yeh CF, Hung CM, et al. Standing genetic variation as the predominant source for adaptation of a songbird. Proc Natl Acad Sci. 2019;116(6):2152–7.
Liu X, Zhang Y, Li Y, Pan J, Wang D, Chen W, Zheng Z, He X, Zhao Q, Pu Y et al. EPAS1 gain-of-function mutation contributes to high-altitude adaptation in Tibetan horses. Mol Biol Evol. 2019;36(11):2591–603.
Li JT, Gao YD, Xie L, Deng C, Shi P, Guan ML, Huang S, Ren JL, Wu DD, Ding L, et al. Comparative genomic investigation of high-elevation adaptation in ectothermic snakes. Proc Natl Acad Sci. 2018;115(33):8406–11.
Sun YB, Fu TT, Jin JQ, Murphy RW, Hillis DM, Zhang YP, Che J. Species groups distributed across elevational gradients reveal convergent and continuous genetic adaptation to high elevations. Proc Natl Acad Sci. 2018;115(45):E10634–41.
Pasquesi GIM, Adams RH, Card DC, Schield DR, Corbin AB, Perry BW, Reyes-Velasco J, Ruggiero RP, Vandewege MW, Shortt JA, et al. Squamate reptiles challenge paradigms of genomic repeat element evolution set by birds and mammals. Nat Commun. 2018;9(1):2774.
Kvon EZ, Kamneva OK, Melo US, Barozzi I, Osterwalder M, Mannion BJ, Tissieres V, Pickle CS, Plajzer-Frick I, Lee EA, et al. Progressive loss of function in a limb enhancer during snake evolution. Cell. 2016;167(3):633-642 e611.
Roscito JG, Sameith K, Parra G, Langer BE, Petzold A, Moebius C, Bickle M, Rodrigues MT, Hiller M. Phenotype loss is associated with widespread divergence of the gene regulatory landscape in evolution. Nat Commun. 2018;9(1):4737.
Vallender EJ, Lahn BT. Multiple independent origins of sex chromosomes in amniotes. Proc Natl Acad Sci. 2006;103(48):18031–2.
Pie MR, Campos LLF, Meyer ALS, Duran A. The evolution of climatic niches in squamate reptiles. Proc Biol Sci. 1858;2017:284.
Navas CA. Herpetological diversity along Andean elevational gradients: links with physiological ecology and evolutionary physiology. Comp Biochem Physiol A Mol Integr Physiol. 2002;133(3):469–85.
He J, Xiu M, Tang X, Yue F, Wang N, Yang S, Chen Q. The different mechanisms of hypoxic acclimatization and adaptation in Lizard Phrynocephalus vlangalii living on Qinghai-Tibet Plateau. J Exp Zool A Ecol Genet Physiol. 2013;319(3):117–23.
Dai X, Zeng XM, Chen B, Wang YZ. The research on the karyotypes of six species in the genus Eremias from China. Hereditas. 2004;26(5):669–75.
Qu A, Li Z. Study on the karyotype and C-banding pottern of chromosomes of Eremias Argus Argus. Journal of Xuzhou Normal University. 1992;4(1):30–33.
Uetz P, Stylianou A. The original descriptions of reptiles and their subspecies. Zootaxa. 2018;4375(2):257–64.
Parra G, Korf BI. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23(9):1061–7.
Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–2.
Schield DR, Card DC, Hales NR, Perry BW, Pasquesi GM, Blackmon H, Adams RH, Corbin AB, Smith CF, Ramesh B, et al. The origins and evolution of chromosomes, dosage compensation, and mechanisms underlying venom regulation in snakes. Genome Res. 2019;29(4):590–601.
Suryamohan K, Krishnankutty SP, Guillory J, Jevit M, Schroder MS, Wu M, Kuriakose B, Mathew OK, Perumal RC, Koludarov I, et al. The Indian cobra reference genome and transcriptome enables comprehensive identification of venom toxins. Nat Genet. 2020;52(1):106–17.
Bourque G, Pevzner PA. Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Res. 2002;12(1):26–36.
Smith CA, Roeszler KN, Ohnesorg T, Cummins DM, Farlie PG, Doran TJ, Sinclair AH. The avian Z-linked gene DMRT1 is required for male sex determination in the chicken. Nature. 2009;461(7261):267–71.
Koopman P, Gubbay J, Vivian N, Goodfellow P, Lovell-Badge R. Male development of chromosomally female mice transgenic for Sry. Nature. 1991;351(6322):117–21.
Sun YB, Fu TT, Jin JQ, Murphy RW, Hillis DM, Zhang YP, Che J. Species groups distributed across elevational gradients reveal convergent and continuous genetic adaptation to high elevations. Proc Natl Acad Sci U S A. 2018;115(45):E10634–41.
Chang P, Heier C, Qin W, Han L, Huang F, Sun Q. Molecular identification of transmembrane protein 68 as an endoplasmic reticulum-anchored and brain-specific protein. PLoS One. 2017;12(5):e0176980.
Yang C, Yuan H, Gu J, Xu D, Wang M, Qiao J, Yang X, Zhang J, Yao M, Gu J, et al. ABCA8-mediated efflux of taurocholic acid contributes to gemcitabine insensitivity in human pancreatic cancer via the S1PR2-ERK pathway. Cell Death Discov. 2021;7(1):6.
Eizuka K, Nakashima D, Oka N, Wagai S, Takahara T, Saito T, Koike K, Kasamatsu A, Shiiba M, Tanzawa H, et al. SYT12 plays a critical role in oral cancer and may be a novel therapeutic target. J Cancer. 2019;10(20):4913–20.
Probert F, Mitchell DA, Dixon AM. NMR evidence for oligosaccharide release from the dendritic-cell specific intercellular adhesion molecule 3-grabbing non-integrin-related (CLEC4M) carbohydrate recognition domain at low pH. FEBS J. 2014;281(16):3739–50.
Seta KA, Yuan Y, Spicer Z, Lu G, Bedard J, Ferguson TK, Pathrose P, Cole-Strauss A, Kaufhold A, Millhorn DE. The role of calcium in hypoxia-induced signal transduction and gene expression. Cell Calcium. 2004;36(3–4):331–40.
Okumura F, Joo-Okumura A, Nakatsukasa K, Kamura T. The role of cullin 5-containing ubiquitin ligases. Cell Div. 2016;11:1.
Salnikow K, Kluz T, Costa M, Piquemal D, Demidenko ZN, Xie K, Blagosklonny MV. The regulation of hypoxic genes by calcium involves c-Jun/AP-1, which cooperates with hypoxia-inducible factor 1 in response to hypoxia. Mol Cell Biol. 2002;22(6):1734–41.
Lu Y, Li S, Cui Z, Dai X, Zhang M, Miao Y, Zhou C, Ou X, Xiong B. The cohesion establishment factor Esco1 acetylates alpha-tubulin to ensure proper spindle assembly in oocyte meiosis. Nucleic Acids Res. 2018;46(5):2335–46.
Alomer RM, da Silva EML, Chen J, Piekarz KM, McDonald K, Sansam CG, Sansam CL, Rankin S. Esco1 and Esco2 regulate distinct cohesin functions during cell cycle progression. Proc Natl Acad Sci. 2017;114(37):9906–11.
Smith SM, Mitchell GS, Friedle SA, Sibigtroth CM, Vinit S, Watters JJ. Hypoxia Attenuates Purinergic P2X Receptor-Induced Inflammatory Gene Expression in Brainstem Microglia. Hypoxia (Auckl). 2013;2013(1):1–11.
Ziółkowska-Suchanek I, Mosor M, Podralska M, Iżykowska K, Gabryel P, Dyszkiewicz W, Słomski R, Nowak J. FAM13A as a novel hypoxia-induced gene in non-small cell lung cancer. J Cancer. 2017;8(19):3933.
Ji S. Asahina N, Kitano S, Kino Y: Bioinformatics data mining approach indicates the expression of chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq)-based hypoxia-inducible factor-1α target genes in periplaque lesions of multiple sclerosis. Clin Exp Neuroimmunol. 2015;6(2):159–69.
Ghorbel MT, Cherif M, Jenkins E, Mokhtari A, Kenny D, Angelini GD, Caputo M. Transcriptomic analysis of patients with tetralogy of Fallot reveals the effect of chronic hypoxia on myocardial gene expression. J Thorac Cardiovasc Surg. 2010;140(2):337-345 e326.
Gong J, Liu J, Ronan EA, He F, Cai W, Fatima M, Zhang W, Lee H, Li Z, Kim GH, et al. A cold-sensing receptor encoded by a glutamate receptor gene. Cell. 2019;178(6):1375-13861 e311.
Kitagawa M, Takebe A, Ono Y, Imai T, Nakao K, Nishikawa S, Era T. Phf14, a novel regulator of mesenchyme growth via platelet-derived growth factor (PDGF) receptor-alpha. J Biol Chem. 2012;287(33):27983–96.
Huang Q, Zhang L, Wang Y, Zhang C, Zhou S, Yang G, Li Z, Gao X, Chen Z, Zhang Z. Depletion of PHF14, a novel histone-binding protein gene, causes neonatal lethality in mice due to respiratory failure. Acta Biochim Biophys Sin (Shanghai). 2013;45(8):622–33.
Park JE, Tse SW, Xue G, Assisi C, Maqueda AS, Ramon GPX, Low JK, Kon OL, Tay CY, Tam JP, et al. Pulsed SILAC-based proteomic analysis unveils hypoxia- and serum starvation-induced de novo protein synthesis with PHD finger protein 14 (PHF14) as a hypoxia sensitive epigenetic regulator in cell cycle progression. Oncotarget. 2019;10(22):2136–50.
Zhang L, Huang Q, Lou J, Zou L, Wang Y, Zhang P, Yang G, Zhang J, Yu L, Yan D, et al. A novel PHD-finger protein 14/KIF4A complex overexpressed in lung cancer is involved in cell mitosis regulation and tumorigenesis. Oncotarget. 2017;8(12):19684–98.
Cheng M, Michalski S, Kommagani R. Role for Growth Regulation by Estrogen in Breast Cancer 1 (GREB1) in Hormone-Dependent Cancers. Int J Mol Sci. 2018;19(9):2543.
Rae JM, Johnson MD, Cordero KE, Scheys JO, Larios JM, Gottardis MM, Pienta KJ, Lippman ME. GREB1 is a novel androgen-regulated gene required for prostate cancer growth. Prostate. 2006;66(8):886–94.
Mazure NM, Chen EY, Laderoute KR, Giaccia AJ. Induction of vascular endothelial growth factor by hypoxia is modulated by a phosphatidylinositol 3-kinase/Akt signaling pathway in Ha-ras-transformed cells through a hypoxia inducible factor-1 transcriptional element. Blood. 1997;90(9):3322–31.
Pugh CW, Ratcliffe PJ. Regulation of angiogenesis by hypoxia: role of the HIF system. Nat Med. 2003;9(6):677–84.
Myers G. Efficient Local Alignment Discovery amongst Noisy Long Reads. Berlin, Heidelberg: Springer Berlin Heidelberg; 2014. p. 52–67.
Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10(6):563–9.
Roach MJ, Schmidt SA, Borneman AR. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics. 2018;19(1):460.
Adey A, Kitzman JO, Burton JN, Daza R, Kumar A, Christiansen L, Ronaghi M, Amini S, Gunderson KL, Steemers FJ, et al. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res. 2014;24(12):2041–9.
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9(11):e112963.
Burton JN, Adey A, Patwardhan RP, Qiu R, Kitzman JO, Shendure J. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol. 2013;31(12):1119–25.
Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35(Web Server issue):W265-268.
Bergman CM, Quesneville H. Discovering and detecting transposable elements in genome sequences. Brief Bioinform. 2007;8(6):382–92.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10.
Yu X-J, Zheng H-K, Wang J, Wang W, Su B. Detecting lineage-specific adaptive evolution of brain-expressed genes in human using rhesus macaque as outgroup. Genomics. 2006;88(6):745–51.
Birney E, Clamp M, Durbin R. GeneWise and Genomewise. Genome Res. 2004;14(5):988–95.
Korf I. Gene finding in novel genomes. BMC Bioinformatics. 2004;5:59.
Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997;268(1):78–94.
Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20(16):2878–9.
Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003;19(Suppl 2):ii215-225.
Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008;9(1):R7.
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36.
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7(3):562–78.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52.
UniProt Consortium T. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2018;46(5):2699.
The Gene Ontology C. Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res. 2017;45(D1):D331–8.
Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 2014;42(Database issue):D199-205.
Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25(5):955–64.
Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 2005;33(Database issue):D121-124.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–7.
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3.
Wang Y, Tang H, Debarry JD, Tan X, Li J, Wang X, Lee TH, Jin H, Marler B, Guo H, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40(7):e49.
Pevzner P, Tesler G. Genome rearrangements in mammalian evolution: lessons from human and mouse genomes. Genome Res. 2003;13(1):37–45.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. Genome Project Data Processing S: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.
Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8.
Chen H, Patterson N, Reich D. Population differentiation as a test for selective sweeps. Genome Res. 2010;20(3):393–402.
Xie C, Mao X, Huang J, Ding Y, Wu J, Dong S, Kong L, Gao G, Li CY, Wei L. KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res. 2011;39(Web Server issue):W316-322.
Hudson RR. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics. 2002;18(2):337–8.
Malinsky M, Challis RJ, Tyers AM, Schiffels S, Terai Y, Ngatunga BP, Miska EA, Durbin R, Genner MJ, Turner GF. Genomic islands of speciation separate cichlid ecomorphs in an East African crater lake. Science. 2015;350(6267):1493–8.
Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 2016;11(9):1650–67.
Anders S, Pyl PT, Huber W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31(2):166–9.
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.
Li WM, Du J, Yang LY, Liang QQ, Yang MY, Zhou XM, Du WG. Chromosome-level genome assembly and population genomics of Mongolian racerunner (Eremias argus) provide insights into high-altitude adaptation in lizards. 2022. NCBI https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA659114.
Li WM, Du J, Yang LY, Liang QQ, Yang MY, Zhou XM, Du WG. Chromosome-level genome assembly and population genomics of Mongolian racerunner (Eremias argus) provide insights into high-altitude adaptation in lizards. 2022. Figshare. https://doi.org/10.6084/m9.figshare.21098470.
We thank Baojun Sun, Xinhong Xie, Liang Ma and Xingzhi Han for collecting samples. We thank Chunrong Mi for help on use of ArcGIS. We thank Dr. Melissa Locke from University of Washington for help with editing.
This project was supported by grants from the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB31000000) and the Second Tibetan Plateau Scientific Expedition and Research Program (STEP) (2019QZKK0501) and the Beijing Natural Sciences Foundation for Distinguished Young Scholars (JQ19022).
Ethics approval and consent to participate
Not applicable. All procedures for care and use of animals were approved by the Ethics Committee of the Institute of Zoology, Chinese Academy of Sciences, and all applicable institutional and governmental regulations concerning the ethical use of animals were followed.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1:
Figure S1. Distribution of 17-mer frequency in Mongolian racerunner genome. Figure S2. GC content against the sequencing depths of Mongolian racerunner genome. Figure S3. Hi-C interactome plot among Mongolian racerunner chromosomes (Chr1-Chr19). Figure S4. Circos plot of the Mongolian racerunner genome assembly showing (from outermost to innermost) Mongolian racerunner chromosomes (Mb), gene density, GC content (%) and TE density (%). Figure S5. The female to male depth ratio (log2) for each chromosome. Figure S6. Divergence distribution of classified families in Mongolian racerunner genome. Figure S7. A comparison of gene parameters among the five lizard genomes. Figure S8. Synteny between chicken and Mongolian racerunner genomes (left), and green anole lizard and Mongolian racerunner genomes (right). Figure S9. Comparative synteny of chromosomes among European wall common lizard (P. muralis), Mongolian racerunner and rattlesnake (C. viridis) genomes. Figure S10. The top 20 functional enrichment results of genes located at rattlesnake genome chromosome breakpoint regions using Metascape. Figure S11. Synteny of the Z chromosome among four species. Figure S12. The isolation with migration (IWM) models of species formation used for coalescent simulations. Figure S13. This figure showed the distribution of XP-CLR values which calculated in 40-kb window and 20-kb step between high and low altitude populations. Figure S14. This figure showed the distribution of absolute divergence (Dxy) values which calculated in 40-kb window and 20-kb step between high and low altitude populations. Figure S15. The absolute divergence (Dxy) comparison between the top 1% FST. Figure S16. Venn plots were shown the global overlaps of the top 1% candidate loci between NMG/GS/QH. Figure S17. A concise map of positively selected genes labeled with blue that may relate to hypoxia and UV light response. Figure S18. The functional enrichment results of differentially expressed genes (DEGs) using Metascape. Figure S19. The FST and θπ, and Tajima’D for the upstream and downstream 1Mb genomic regions of these five genes were calculated in 40kb window. Figure S20. The original and uncropped gels for cell experiments.
Additional file 2: Table S1.
Reads and libraries used in de novo Mongolian racerunner genome sequencing. Table S2. Assembly statistics of Mongolian racerunner genome. Table S3. The CEGMA (Core Eukaryotic Genes Mapping Approach) result of Mongolian racerunner genome. Table S4. The BUSCO (Benchmarking Universal Single-Copy Orthologs) result of Mongolian racerunner genome. Table S5. The statistics on genome coverage and depth of Mongolian racerunner genome. Table S6. The statistics of repeat sequences in Mongolian racerunner genome. Table S7. Gene prediction of Mongolian racerunner (Eremias argus) genome. Table S8. The statistics on functional gene annotation of Mongolian racerunner genome. Table S9. Geographical position and sex of sampled Mongolian racerunner individuals. Table S10. Mapping statistics of re-sequenced individuals. The genome-wide resequencing data for 61 samples were mapped to the de novo Mongolian racerunner genome. Table S11. Summary of SNPs in 61 re-sequenced individuals of Mongolian racerunner. Table S12. The top 1% genes under select sweep region in the Inner-Mongolia (NMG) population. Table S13. The top 1% genes under select sweep region in the GanSu (GS) population. Table S14. The top 1% genes under select sweep region in the Qinghai (QH) population. Table S15. Biological Process (BP) GO term enrichment results of selected genes in the NMG populations. Table S16. KEGG enrichment results of selected genes in the NMG population. Table S17. Biological Process (BP) GO term enrichment results of selected genes in the GS populations. Table S18. KEGG enrichment results of selected genes in the GS populations. Table S19. Biological Process (BP) GO term enrichment results of selected genes in the QH populations. Table S20. KEGG enrichment results of selected genes in the QH populations. Table S21. The primers for PCR.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Li, W., Du, J., Yang, L. et al. Chromosome-level genome assembly and population genomics of Mongolian racerunner (Eremias argus) provide insights into high-altitude adaptation in lizards. BMC Biol 21, 40 (2023). https://doi.org/10.1186/s12915-023-01535-z
- Whole-genome sequencing
- Population genomics
- High-altitude adaptation