Comparative gene retention analysis in barley, wild emmer, and bread wheat pangenome lines reveals factors affecting gene retention following gene duplication

Jia, Yong; Xu, Mingrui; Hu, Haifei; Chapman, Brett; Watt, Calum; Buerte, B.; Han, Ning; Zhu, Muyuan; Bian, Hongwu; Li, Chengdao; Zeng, Zhanghui

doi:10.1186/s12915-022-01503-z

Research article
Open access
Published: 06 February 2023

Comparative gene retention analysis in barley, wild emmer, and bread wheat pangenome lines reveals factors affecting gene retention following gene duplication

Yong Jia^1,2^na1,
Mingrui Xu³^na1,
Haifei Hu^1,2,
Brett Chapman^1,2,
Calum Watt^2,4,
B. Buerte⁵,
Ning Han⁵,
Muyuan Zhu⁵,
Hongwu Bian⁵,
Chengdao Li ORCID: orcid.org/0000-0002-9653-2700^1,2,6 &
…
Zhanghui Zeng^3,5,7

BMC Biology volume 21, Article number: 25 (2023) Cite this article

4447 Accesses
11 Citations
3 Altmetric
Metrics details

Abstract

Background

Gene duplication is a prevalent phenomenon and a major driving force underlying genome evolution. The process leading to the fixation of gene duplicates following duplication is critical to understand how genome evolves but remains fragmentally understood. Most previous studies on gene retention are based on gene duplicate analyses in single reference genome. No population-based comparative gene retention analysis has been performed to date.

Results

Taking advantage of recently published genomic data in Triticeae, we dissected a divergent homogentisate phytyltransferase (HPT2) lineage caught in the middle stage of gene fixation following duplication. The presence/absence of HPT2 in barley (diploid), wild emmer (tetraploid), and bread wheat (hexaploid) pangenome lines appears to be associated with gene dosage constraint and environmental adaption. Based on these observations, we adopted a phylogeny-based orthology inference approach and performed comparative gene retention analyses across barley, wild emmer, and bread wheat. This led to the identification of 326 HPT2-pattern-like genes at whole genome scale, representing a pool of gene duplicates in the middle stage of gene fixation. Majority of these HPT2-pattern-like genes were identified as small-scale duplicates, such as dispersed, tandem, and proximal duplications. Natural selection analyses showed that HPT2-pattern-like genes have experienced relaxed selection pressure, which is generally accompanied with partial positive selection and transcriptional divergence. Functional enrichment analyses showed that HPT2-pattern-like genes are over-represented with molecular-binding and defense response functions, supporting the potential role of environmental adaption during gene retention. We also observed that gene duplicates from larger gene family are more likely to be lost, implying a gene dosage constraint effect. Further comparative gene retention analysis in barley and bread wheat pangenome lines revealed combined effects of species-specific selection and gene dosage constraint.

Conclusions

Comparative gene retention analyses at the population level support gene dosage constraint, environmental adaption, and species-specific selection as three factors that may affect gene retention following gene duplication. Our findings shed light on the evolutionary process leading to the retention of newly formed gene duplicates and will greatly improve our understanding on genome evolution via duplication.

Background

Gene and gene evolution are of paramount importance to understand how life has evolved and functions on earth. The major mechanism of gene evolution is through gene duplication [1,2,3], which is a common evolutionary process in the genomes of all living organisms [4, 5]. Gene duplication provides raw genetic materials that mutation and selection can act on and plays a critical role in genome evolution [6], phenotypic diversification [7, 8], and environmental adaptation [9]. It has been reported that 38% of the annotated genes in human genome [10], 41% in fruit fly [11], 65% in Arabidopsis [12], 30% in yeast [11], and 17% in bacteria [11] are gene duplicates. Comparative genomics analyses across different life forms have revealed that gene families are generally conserved [11, 13], leading to the suggestion that gene duplication, instead of gene invention, is the major driving force underlying genome evolution [6, 14].

Following a duplication event, most of the newly formed gene duplicates experience relaxed selection constraints and are rapidly eliminated via gene pseudogenization or gene deletion [15,16,17]. The classical population genetic theory has predicted that only a small number of new duplicates can be retained and fixed in the population [17,18,19], mainly due to the genetic drift effect. On the other hand, several studies have suggested that gene duplication events may occur as frequently as common genetic variations, such as single-nucleotide polymorphism (SNP) [17, 20,21,22]. These studies suggest that gene duplication, gene loss, and gene retention occur concurrently in a dynamic manner. Given the extremely low chance of a single duplication event becoming fixed in a population [16, 19], it raises the question: what type of gene can escape the gene loss fate and be retained? Or what impacting factors may determine the gene retention process?

To date, this question has only been partially answered. By analyzing duplicated gene pairs formed by ancient polyploidy events, pioneering studies in Arabidopsis [23, 24] showed that genes retained in duplicate are not distributed evenly among Gene Ontology (GO) or gene functional categories. Generally, they observed that genes encoding transcription factors and signal transduction proteins tend to be preferentially retained. This observation was corroborated by later findings in other organisms including rice, poplar, tetraodon, and yeast [25,26,27]. In addition, it has also been noticed that the gene retention process varies for duplicates resulted from whole genome duplications (WGDs) and small-scale duplications (SSDs, such as tandem duplication) [28, 29] and also seems to be affected by various other factors such as protein properties, transcription level, and epigenetics [29,30,31,32]. Along with these findings, various models and hypotheses have been proposed to explain the biased process of gene retention following duplication [5, 33]. Most recently, a new “Structural and functional entanglement” theory was proposed to explain how gene duplicate can be retained in yeast [5, 34]. This highlights the complex nature of gene evolution via duplication, which remains fragmentally understood.

One notable limitation for these earlier studies [23,24,25,26, 31, 32, 35, 36] is that they were based on gene duplicate analyses in single or a couple reference genomes. Gene duplicates present in individual reference genome are mainly caused by ancient ploidy or other long-term duplication events [23]. Therefore, the observed gene retention pattern may not reflect directly how gene duplicates can be retained. In addition, it is also unclear how much role that natural selection has played during the gene retention process. Therefore, we ask the question: is it possible to analyze how gene duplicates are retained in a real-time manner? Since the fixation process of gene duplicates occurs dynamically in the context of a large population, we reason that analyzing the gene retention pattern in the pangenomes would answer this intriguing question. Luckily, the recent revelation of pangenome data in more and more species provides a valuable opportunity that is needed.

Compared to other organisms, plants are particularly prone to polyploidization, thus displaying a higher rate of gene duplication [37, 38], thereby providing a more suitable model to study how gene evolves via duplication. In this study, we aim to characterize the gene evolution process via duplication, with a particular focus on how gene duplicates are retained in the population context. We take advantage of recently published genomic data in barley [39], wild emmer [40, 41], and bread wheat [42], to dissect a convincing example of gene caught in the middle stage of gene fixation following duplication. Next, we expanded the search to the whole genome level and identified a set of candidate genes that have been retained in a biased manner across species. Critical gene functional and natural selection pressure profiles for these preferentially retained gene duplicates were obtained. We introduce a population-based gene retention analysis, as a novel genomic approach to take a snapshot of the gene evolution process via duplication. Our analyses provide novel insights into the long-debated process of gene retention after duplication.

Results

Identification of a divergent lineage of homogentisate phytyltransferase for vitamin E biosynthesis in Triticeae and Aveninae

Homogentisate phytyltransferase (HPT) is a critical gene responsible for tocopherol production in the vitamin E biosynthetic pathway (Fig. S1). We previously observed that HvHPT knock-out mutation cannot eliminate tocopherol production [43], suggesting multiple copies of HPT may be present in barley genome. Thus, we performed a genome-wide screening of UbiA prenyltransferase-encoding gene (PF01040) in 22 Poales species, including 8 core Pooideae (Hordeum vulgare, Triticum aestivum, Triticum dicoccoides, Triticum urartu, Thinopyrum intermedium, Aegilops tauschii, Secale cereal, Avena eriantha), 5 Brachypodieae (Brachypodium distachyon, Brachypodium hybridum, Brachypodium mexicanum, Brachypodium stacei, Brachypodium sylvaticum), 4 Panicoideae (Panicum hallii, Sorghum bicolor, Setaria italic, Zea mays), Oryza sativa, and Ananas comosus. Three additional Pooideae species Nardus stricta, Stipa lagascae, and Melica nutans were also searched to resolve HPT evolution. A total of 62 HPT homologous genes were identified (Additional file 1: Table S1).

A preliminary neighbor joining (NJ) phylogeny was developed to differentiate HPT from its close homolog homogentisate geranylgeranyl transferase (HGGT) (responsible for tocotrienol production, Additional file 1: Fig. S1). The target genes grouped into 2 major clusters, corresponding to HPT and HGGT, respectively (Fig. 1A). A more robust maximum likelihood (ML) phylogeny (Fig. 1B) was constructed for HPTs specifically. The overall topology was consistent with the species phylogeny (Fig. 1C). Consistent with our speculation, Pooideae species HPTs divided into two major subclades. The first HPT subclade (named as HPT1) covers all target species in this study and contains the previously characterized HPTs [43,44,45]. The second HPT subclade (named as HPT2, Fig. 1B) only covers 5 species from Triticeae (barley, bread wheat, wild emmer, T. intermedium) and Aveninae (A. eriantha) and represents a novel observation. The presence of HPT2 in Aveninae and Triticeae species indicated that HPT2 emerged in their common ancestor. None of the five Brachypoideae species contain HPT2 (Fig. 1B), supporting the emergence of HPT2 after the split of Brachypodieae, corresponding to ~ 25 million years ago (Fig. 1C). In the ML tree, HPT2 diverges earlier than N. stricta, S. lagascae, M. nutans, and Bracypoideae HPT1s, suggesting that HPT2s have very divergent sequences from HPT1s. Indeed, protein sequence identity analysis showed that HPT2s only have an average 78% identity with Pooideae HPT1s, lower than the 81% identity between Panicoideae HPT1s and Pooideae HPT1s (Additional file 1: Table S2).

HPT2 resulted from dispersed gene duplication and is partially retained in the pangenomes of barley, wild emmer, and bread wheat

Synteny analyses in 7 Poales reference genomes (Fig. 2A) showed that HPT1 is generally conserved in a collinear block, with the exception of OsHPT1 which is located in a different genetic region from HPT1s in other species. In contrast, HPT2 is only preserved in reference genomes of barley and wild emmer and displays a clear gene insertion pattern. In barley, HvHPT1 and HvHPT2 were found on two different chromosomes with no synteny with each other. These observations support HPT2 as a dispersed duplication from HPT1.

During gene screening, we noticed that HPT2 is absent in some barley germplasm lines. To obtain a complete presence/absence profile, we searched HPT2 in the pangenome lines of barley, wild emmer, and bread wheat. Results (Fig. 2B) showed that HPT2 is conserved in 13 out of the 20 barley pangenomes (65%), 8 out of the 20 representative wild emmer lines (40%), and 2 out of the 16 wheat pangenomes (12.5%). In particular, the presence of HPT2 only in wheat cultivars Lancer (TraesLAC2B03G00984720.1) and Julius (TraesJUL2B03G01032940.1) was validated by tblastn search against the wheat pangenome assemblies. Interestingly, the retention rates of HPT2 are negatively correlated (coefficient = −0.9996) with HPT1 copy numbers in barley (1), wild emmer (2), and bread wheat (3) (Fig. 2C), suggesting that the retention of HPT2 may be affected by a varied HPT1 gene dosage constraint. Due to the absence of HPT2 in bread wheat reference genome Chinese Spring, we designed a gene-specific marker for HPT2 in bread wheat and genotyped a collection of 113 bread wheat accessions (mainly Australian cultivars). Results (Additional file 1: Table S3) showed that 12 out of the 113 (10.6%) accessions contain the HPT2 gene, very close to the observation in the 16 wheat pangenome lines.

The Evolution Canyon (EC) in Israel is comprised of two abutting slopes (Fig. 2D, distanced by 250m) with contrasting micro-environments: north002Dfacing slope (NFS, cool and humid) and south-facing slope (SFS, hot and dry), and has been used as a model to study the environmental adaption of various organisms [46,47,48,49]. During HPT2 gene scanning, we observed that HPT2 is only present in wild barley lines from NFS but absent in SFS (Fig. 2D). This observation suggests that the retention of HPT2 may be related to environmental adaption. In plants, vitamin E synthesis plays a major role in cold tolerance and low-temperature response [50]. To investigate whether HPT2 may be preferentially retained in plants grown under cold environment, we scanned 109 wild barley lines from Tibet (low temperature and high UV). Results (Fig. 2E) showed that HvHPT2 is present in 108 out of the 109 Tibetan barley lines (99.1%), supporting HPT2’s role in environmental adaptation. Furthermore, a geographic distribution of the presence-absence of HvHPT2 in 300 barley resequencing lines was generated (Fig. 2E), which showed that the retention of HPT2 (237 of the 300 lines, ~ 79%) is geographically imbalanced, implying a potential local environmental selection pattern.

To deduce the origin of HvHPT2 in barley, a ML tree was constructed based on HvHPT2 SNPs. HPT2 (TRIDC2Bv2G208200) from wild emmer was used as outgroup. Results (Fig. 2F) showed that HvHPT2 in wild barley lines from Turkey (2), Syria (1), and Iraq (1) diverged first, suggesting that these wild barley lines may contain the earliest HvHPT2 haplotypes. In contrast, the 10 Tibetan wild barley lines formed a distant cluster, suggesting that the HvHPT2 haplotype in Tibetan barley may have evolved at a later stage.

HPT2 underwent an interplay of purifying, relaxed, and positive selection

The ratio (ω) of non-synonymous (Ka) to synonymous (Ks) substitutions is used to assess gene selection pressure, whereby ω < 1, ω = 1, and ω > 1 indicate purifying, neutral, and positive selections, respectively. Four phylogeny lineages: HPT2s, Triticeae-Aveninae HPT1s, Brachypodieae HPT1s, and Panicoideae HPT1s (highlighted in Fig. 1B) were specified for ω calculation, corresponding to ω_HPT2, ω_TA, ω_BR, and ω_PC, respectively.

Under the branch-specific method (Table 1), likelihood ratio tests (LRTs) showed that the two-ratio model (ω_HPT2≠ω_BR=ω_TA=ω_PC) fitted the data significantly better (p = 0.0014 with 2∆l = 10.22 and df = 1) than the one-ratio model (ω_HPT2=ω_BR=ω_TA=ω_PC). In contrast, the three-ratio model (ω_HPT2≠ω_BR=ω_TA≠ω_PC) and the four-ratio model (ω_HPT2≠ω_BR≠ω_TA≠ω_PC) were not better than the two-ratio model (p =0.2059 with 2∆l = 1.6 and df = 1, p = 0.6005 with 2∆l = 1.02 and df = 2, respectively). Thus, the two-ratio model (ω_HPT2≠ω_BR=ω_TA=ω_PC) was identified as the besting fitting model, which indicates that HPT2 was under different selection pressure (ω_HPT2=0.81568) from HPT1 lineages (ω_BR=ω_TA=ω_PC=0.28457). ω_BR=ω_TA=ω_PC=0.28457 showed that the HPT1 lineages were under strong purifying selection pressure, consistent with their strict conservation in all target species. In contrast, ω_HPT2=0.81568 indicates HPT2s has clearly relaxed selection pressure than HPT1s.

Table 1 Natural selection tests on HPT genes

Full size table

To test if HPT2 has been affected by positive selection, we applied the branch-site models, which allow ω to vary across both branches and amino acid sites. Results (Table 1, Model A) showed that 15 amino acid sites (9P, 79M, 89N, 100T, 103D, 149S, 164S, 167D, 204F, 212L, 217I, 246V, 270M, 271A, 356V; numbered according to HvHPT2) were found to be under positive selection (ω₂=4.92076). LRTs showed that Model A fitted the data significantly better (p = 0.0486 with 2∆l = 3.89 and df = 1) than its null hypothesis (Table 1, Model A Null, ω₂=1), suggesting these amino acid sites were indeed under positive selection. In contrast, when the Triticeae-Aveninae HPT1s were tested for positive selection, no amino acid site could be identified under positive selection.

HvHPT2 shares functional redundancy with HvHPT1 in vitamin E biosynthesis

To verify gene dosage constraint hypothesis on HPT2 retention, we investigated if HPT2 shares functional redundancy with HPT1. At the transcriptional level, semi-RT-PCR (Fig. 3A) showed that HvHPT2 was specifically expressed in spike. In contrast, HvHPT1 was universally transcribed in root, leaf, stem, and spike. Further qRT-PCR analyses (Fig. 3B) showed that HvHPT2 was specifically expressed in husk, whereas HvHPT1 and HvHGGT were highly transcribed in embryo and endosperm, respectively. Moderate transcription of HvHPT1 was also detected in husk. Transgenic overexpression in rice (Fig. 3C) showed that HvHPT2 was indeed transcribed in husk. Protein subcellular location analyses showed that HvHPT2 was targeted to chlorophyll specifically, similar with that observed for HvHPT1 and HvHGGT (Fig. 3D).

Next, HvHPT2 was overexpressed in barley “Golden Promise” which lacks HvHPT2. In the transgenic lines, qRT-PCR showed that the transcription of HvHPT2 in leaf was about 20~35 times that in grain (Fig. 3E). In contrast, no expression of HvHPT2 could be detected in the wild type (Fig. 3E). Vitamin E content measurement (Fig. 3F, G) showed that α-tocopherol was the major form of tocopherol in leaf tissue. Instead, comparable levels of α- and γ- tocopherol were accumulated in the grain. Compared to wild type, HvHPT2 transgenic lines accumulated much higher amount of α-tocopherol in both leaf and grain, suggesting that HvHPT2 is mainly functional in α-tocopherol production, similar as HvHPT1 [43].

In accordance with HPLC profile, quantitative analysis confirmed that α-tocopherol was the predominant form of tocopherols in both wild-type and transgenic lines, accounting for 95–97% and 65–76% of total tocopherols in leaf (Fig. 3H) and grain (Fig. 3I), respectively. In contrast, γ-tocopherol only accounted for 2–4% in leaf and 20–30% in grain. In addition, β- and δ-tocopherol contents were low in all target samples. Compared to wild-type barley, α-tocopherol in HvHPT2 transgenic lines increased significantly by 20–111% and 38–60% in leaf and grain, respectively. Furthermore, γ- and β-tocopherol also increased by about 2-fold in the leaf of transgenic lines, despite their low content. Noteworthy, no significant change for γ- and δ-tocopherol was observed in transgenic grains in comparison with wild type. In terms of total tocopherol content, the three HvHPT2 transgenic lines displayed on average 36% and 55% increase in grain and leaf, respectively (Fig. 3J, Additional file 1: Table S4).

Comparative gene retention analyses revealed a gene dosage constraint effect

The potential correlation of the retention rates of HPT2 with gene dosage constraint (functional redundancy) and environmental selection provided a valuable opportunity to investigate the gene retention process following gene duplication. To gain more insights into this important evolutionary process, we expanded gene retention analyses to whole genome scale and searched for genes with similar retention pattern with HPT2 (defined as HPT2-pattern-like genes). Orthologous genes (Additional file 6) across barley, wild emmer, and bread wheat were inferred using an accurate phylogeny-based algorithm implemented by OrthoFinder program [51] using B. distachyon as outgroup species. As shown in Fig. 4A, the program reconstructed the species tree correctly. A total of 31,697, 34,790, and 39,091 hierarchical orthologous groups (HOG) at nodes N0, N1, and N2. The HOGs identified at node N1 covering the three target species (barley, wild emmer, and bread wheat) were selected for gene retention analysis.

Gene retention rate (GRR) of each HOG genes in 20 barley pangenome lines, 20 representative wild emmer lines, and 15 bread wheat pangenome lines (excluding the Triticum spelta line PI190962 in Fig. 2B) were calculated (Additional file 7). Linear correlation analyses showed that GRR is negatively associated with orthologous group numbers in the target lines of barley (−0.563), wild emmer (−0.590), and bread wheat (−0.907) (Fig. 4B), suggesting that genes from more expanded families are more likely to be lost. This finding supports the effect of gene dosage constraint on gene retention. Moreover, the negative correlation slopes seem to increase from barley (−0.563) to wild emmer (−0.590) to bread wheat (−0.907), suggesting that the gene dosage constraint effect tends to increase along with ploidy level.

Genome-wide identification of HPT2-pattern-like genes and functional analyses

Based on GRR calculation, 315 orthologous HOGs containing 326 HPT2-pattern-like genes (barley as reference, Additional file 8) were identified to follow GRR_barley > GRR_emmer > GRR_bread wheat, representing a pool of candidate genes under ongoing gene fixation process and potentially affected by gene dosage constraints.

Depending on different genetic mechanisms, gene duplicates can be classified into WGD/segmental, tandem, proximal, and dispersed duplications. Gene duplication analyses (Fig. 4C, Additional file 9) showed that HPT2-pattern-like genes mainly result from small-scale duplication events, the highest being dispersed duplication (59.5%), followed by tandem duplication (21.5%) and proximal duplication (14.7%). In contrast, singleton genes and genes resulted from WGD event each account for 1.8% and 2.5%, respectively. These results suggest that small-scale duplications play a dominant role for genes under gene fixation.

To gain biological insights into the HPT2-pattern-like genes, GO enrichment analyses were performed (Fig. 4D, Additional file 10). At the molecular function (MF) level, the target genes were found significantly enriched with GO terms involved in various molecule binding functions (ADP-binding, nucleotide binding, ATP-binding, carbohydrate derivative binding, anion binding, small molecular binding), followed by kinase activity and receptor activity. Majority of these enriched molecular functions were related to molecular binding and interaction. At the biological pathway (BP) level, these enriched GO terms were found to be significantly related to defense response, protein phosphorylation, phosphorylation, response to stimulus, and response to stress suggesting that HPT2-pattern-like genes may be closely associated with plants’ environmental adaption. At the cellular component (CC) level, HPT2-pattern-like genes were found significantly enriched with GO terms related to membrane component, consistent with an active role for membrane protein in plants’ defense responses. Further gene network (Fig. 4E) analyses showed that HPT2-pattern-like genes are significantly enriched with two major biological processes: protein phosphorylation and defense responses, which are closely linked with each other.

HPT2-pattern-like genes displays relaxed selection and are under varying degrees of positive selection

To characterize the natural selection pressure of HPT2-pattern-like genes, Ka, Ks, and Ka/Ks for the HPT2-pattern-like genes (target) and their closest homologous genes (putative parental genes, defined as background) were calculated (Additional file 11). Results (Fig. 5A) showed that the target and background genes have similar Ks distribution profiles with overlapping peak values, indicating that they have experienced similar evolution time. In contrast, the Ka for the target genes peaked at a higher value than that for the background genes (Fig. 5B), implying an elevated selection pressure. Indeed, when Ka/Ks was calculated and compared, more target genes was found in the Ka/Ks > 2.2 range, indicating a higher level of positive natural selection (Fig. 5C). In addition, a higher percentage of target genes were also observed for the Ka/Ks < 1.0 range, which needs to be further examined on the basis of individual gene. Compared to the mean Ka/Ks of each orthologous group, the target genes (Fig. 5D) displayed relatively higher Ka/Ks values, supporting a relaxed selection pressure. In contrast, the background genes exhibited significantly (p = 0.03) lower Ka/Ks values, indicating a stronger purifying selection.

To detect potential positive selection, we randomly selected 8 HPT2-pattern-like genes (Fig. 5E–L) and drew the phylogenetic trees using B. distachyon gene as outgroup (Additional file 12). In terms of gene functions, most of these 8 genes can be related to environmental adaptation: transporter (Fig. 5E), disease resistance kinase (Fig. 5H, L), antioxidant biosynthesis (Fig. 5F, J), calcium sensor (Fig. 5G), regulating protein kinase, oxidoreductase (Fig. 5K), and oligopeptidase (Fig. 5I). Ka/Ks calculation for the target (brown) and background (blue) lineages (Fig. 5E–L) showed all target lineages display higher Ka/Ks than their parental genes, suggesting generally relaxed selection pressure. All 8 target genes have Ka/Ks value below 1.0, indicating a functional constraint. When branch-site selection model was applied, results (Fig. 5M) showed that positive selection was detected in 7 out of the 8 target genes, with the exception of gene J. The percentage of positively selected sites ranges from 0.16% in gene L to 5.94% in gene K, implying varied levels of positive selection.

For transcriptional divergence analysis, transcriptional data for target and background genes in six barley tissues (Caryopsis: 5DPA; Inflorescence: 1–1.5 cm; Lemma: 42 DPA; Palea: 42 DPA; Shoot: 10 cm stage; Root: 28 DPA) were retrieved and compared. The transcription of the 8 target genes clearly varied from that for the background genes (Fig. 5N). Overall, the background genes tend to be actively transcribed in multiple tissues, while the target genes generally have decreased expression in the six tissues and more likely to be transcribed specifically in some tissues.

Comparative gene retention analyses in barley and bread wheat reveals a species-specific selection pattern

Species-specific gene family evolution is a common observation in plants. To investigate if gene retention may be affected by species-specific selection pressure, we selected barley and bread wheat as example species and divided the partially retained genes into 3 categories: GRR_barley > GRR_wheat (barley preferential), GRR_barley = GRR_wheat (neutral), and GRR_barley < GRR_wheat (bread wheat preferential). To reduce potential confounding effect during GRR calculation, we applied a more stringent criteria: by defining R_bw = GRR_barley / GRR_wheat, the three gene categories were selected as R_bw ≥ 1.2 (barley preferential), 1.1 > R_bw > 0.9 (neutral), and R_bw ≤ 0.8 (bread wheat preferential) (Fig. 6A). Due to a higher ploidy of bread wheat (hexaploid) than that of barley (diploid), we expect a relaxed gene dosage constraint in bread wheat, thus a relatively lower GRR in bread wheat. Therefore, a lower number of gene should be identified for R_bw ≤ 0.8 than that for R_bw ≥ 1.2. Indeed, a total of 1621, 707, and 389 genes were obtained for the above 3 categories, respectively (Fig. 6A,B). The number of genes preferentially retained in barley is over 4 times of that preferentially retained in bread wheat. The higher number of barley preferential genes than wheat is consistent with a gene dosage constraint hypothesis: Since the bread wheat genome is hexaploid while the barley genome is diploid, for each orthologous gene group, wheat generally has a higher gene dosage than barley. The higher gene dosage/redundancy in wheat would result in a lower gene constraint on individual gene duplicate. Therefore, newly duplicated genes are more likely to be lost than that in barley due to a varied selection constraint. Moreover, the gene dosage effect on gene retention rates is also consistent with our results in Fig. 4B, which suggests that genes from a large gene family are more likely to be lost. Genomic data from additional species in Triticeae can be used to further verify our hypothesis. In addition to gene dosage effect, other unknown factors that may also contribute to biased gene retention rates between barley and bread wheat, which requires further investigation.

To explore if the gene duplication type may vary for the 3 gene categories, their duplication type profiles were determined. Results (Fig. 6B) showed that the 3 gene sets displayed a similar gene duplication type composition, which is dominated by dispersed duplicates (56.0~59.8%), followed by tandem duplicates (19.9~22.4%) and proximal duplicates (11.7~16.5%), sequentially. WGD and singleton duplicates account for 2.6~3.8% and 2.6~5.1%, respectively. These observations suggest that gene duplication type is independent from species. The partially retained genes in barley and bread wheat genomes have resulted from similar gene duplication mechanisms.

Next, functional enrichment analyses was performed for each gene category and compared. For the R_bw ≥ 1.2 (barley preferential) group, the highest enrichment significance (Fig. 6C, left) was observed. Overall, the enriched GO terms for R_bw ≥ 1.2 resemble that identified for the 326 HPT2-pattern-like genes at the MF, BP, and CC levels. Genes with various molecular binding and protein kinase functions, and related to protein phosphorylation and defense response were among the highest enriched GO terms. In contrast to the HPT2-pattern-like genes, a significant number of GO terms related to pollen recognition, cell recognition, pollen-pistill interaction, and poolination were identified. For the R_bw ≤ 0.8 (bread wheat preferential) group (Fig. 6B, right), a significant number of enriched GO terms were also identified. Compared to R_bw ≥ 1.2, R_bw ≤ 0.8 displayed some overlapped GO terms such as protein phosphorylation, plant defenses, and responses to stimulus. These results indicated that the bread wheat preferential genes (R_bw ≤ 0.8) were also under significant selection pressure, albeit at a relatively lower significance level. In contrast to R_bw ≥ 1.2 and R_bw ≤ 0.8, barely no GO term was found to be enriched for the 1.1 > R_bw > 0.9 (neutral) group at the significant level, except that ADP-binding and defense response were weakly enriched (Fig. 6B, middle), indicating that the 1.1 > R_bw > 0.9 group were more likely under neutral selection and may play a less significant role in plant-specific evolution. In summary, gene enrichment analyses revealed a stronger selection signal in the barley preferential group, followed by the wheat preferential group, and the least in the neutral group. This observation is consistent with a putative species-specific selection and gene dosage effect theory, whereby the wheat preferential gene group may represent the species-specific selection effect only, while the barley preferential gene group results from the combined species-specific selection and gene dosage effect.

Discussion

The process leading to gene fixation following duplication is critical to understand how genomes evolve and thus has attracted great interest from evolutionary biologists. Various models and hypotheses [5, 33] have been proposed to explain how gene duplicate is retained following duplication. One notable limitation for earlier studies [23, 25, 27, 31, 52] is that they were all based on gene duplicate analyses in single reference genome. Gene duplicates present in individual reference genome are mainly caused by ancient ploidy or other long-term duplication events [23]. Therefore, the observed gene retention pattern may not reflect directly how gene duplicates can be retained. Since the gene fixation process of gene duplicates occurs in the context of a population, we reason that analyzing the gene retention pattern at the population level would enable us to identify gene duplicates under gene fixation selection.

Population-based gene retention analysis captures gene evolution process via duplication and reveals the effect of gene dosage constraint on gene retention

Based on comparative gene retention analyses in barley, wild emmer, and wheat pangenome lines, we dissected a convincing example of an evolutionary lineage HPT2 caught in the middle stage of gene fixation. Clear evidences of gene duplication, partial gene retention, and functional constraints support HPT2 being in the middle stage of gene fixation can be drawn from. The loss of HPT2 in some lines of the pangenome lines is in accordance with classical population theory that the most likely fate of gene duplicate is gene loss [16, 17]. Our observation of HPT2 being duplicated in the common ancestor of Triticeae and Avenidae suggests that gene loss can persist over a long timescale, but is not restricted to the early stage following duplication [52, 53]. Although presence/absence variation (PAV) mutation is commonly observed in previous studies [39, 42, 54], what distinguishes HPT2 from a common PAV is that its retention is associated with HPT1 gene dosage, potentially due to their functional redundancy. Our assumption of functional redundancy between HPT1 and HPT2 is verified at both gene transcriptional and functional levels. A higher copy number of homologous genes in wild emmer and bread wheat would make HPT2-pattern-like genes functionally more redundant, thereby leading to a higher chance of gene loss. The gene dosage hypothesis agrees with previous studies [52, 53, 55, 56] based on single reference genome, which suggest gene dosage constraints as one potential factor affecting gene retention following gene duplication. In this study, the effect of gene dosage constraint on gene retention is further corroborated by the correlation analyses between GRR and gene family size, independently in barley, wild emmer, and bread wheat pangenome lines. The negative correlation between GRR and gene family size support that gene duplicates from more expanded gene family are more likely to be lost. Interestingly, we found that this gene dosage constraint effect on gene retention tends to increase with genome ploidy level, which also makes sense considering the potentially higher genetic redundancy in higher ploidy genomes.

Accurate detection of genuine orthologous gene groups across multiple species is critical for the comparative gene retention analyses in this study. Compared to traditional sequence similarity-based approaches, the phylogeny-based orthology inference by OrthoFinder has the advantages of higher accuracy and high throughput at the whole genome scale [51]. A recent study [57] reported a novel collinearity-incorporating method for homology inference in Triticeae. However, considering the large amount of transposable element-mediated gene duplication events in Triticeae [58], the collinearity-incorporating method is less preferred in this study where most of the target HPT2-pattern-like genes were found to be SSDs. The innovative combination of phylogeny-based orthologous gene inference and population-based gene retention analysis enables us to identify 326 HPT2-pattern-like genes at whole genome scale. We argue that these HPT2-pattern-like genes represent a pool of candidate genes under ongoing gene fixation and share two common features: firstly, duplication occurred in the common ancestor of barley, wild emmer, and bread wheat; secondly, biased retention across species. This gene pool provides a valuable opportunity to characterize the evolution profile of the gene retention process following duplication. For example, we observed apparently relaxed selection pressure among HPT2-pattern-like genes. This finding is consistent with previous theoretical prediction that duplicated genes will undergo immediate relaxed selection due to reduced functional constraints [17, 59, 60]. In particular, SSDs seem to exhibit a higher level of relaxed selection than WGDs [60]. Indeed, most of the HPT2-pattern-like genes were determined as SSDs in this study. There are increasing evidence showing that relaxed selection, coupled with positive selection, plays a critical role for the rapid evolution of phenotypic plasticity [61,62,63]. Our study is also corroborated by a recent study which reports that enhanced mutation rate and relaxed selection are associated with gene loss in microorganisms [64]. In addition to relaxed selection, we also found positive selection and transcriptional divergence are prevalent among HPT2-pattern-like genes, which are consistent with previous observations and predictions [4, 65,66,67,68,69]. The highly similar evolution profile for HPT2 and HPT2-pattern-like genes support HPT2-pattern-like genes being in the middle stage of gene fixation. Our analyses prove that population-based gene retention analyses may serve as a novel and effective approach to study gene evolution via duplication in a “real-time” manner.

Environmental selection plays a clear role on gene retention following duplication

In addition to gene dosage constraint, we found that environmental selection is another impacting factor affecting the retention of HPT2. We found additional cis-regulatory elements related to low-temperature responses in the putative promoter region of HvHPT2 (Additional file 1: Table S5). The elevated retention rate of HPT2 in Tibetan barley may be attributed to its association with low-temperature adaption. This type of allele frequency bias within a subpopulation has been widely identified as a signature of natural selection [70,71,72]. In addition, the presence and absence of HPT2 in wild barley lines collected from the two abutting slopes of EC [46], respectively, provides direct evidence that HPT2 may be involved in environmental adaption to microclimates. The retention of HPT2 in wild barley in the cool and humid NFS slope is consistent with the proven function of tocopherols in low-temperature responses [45, 50]. In addition to barley, the retention of HPT2 in bread wheat should also be subjected to environmental selection, which can be verified by comparing different wheat populations under different environmental conditions. However, the presence of 3 copies of HPT1 in bread wheat genome would make HPT2 less important in bread wheat than that in barley, resulting in lower selection pressure on bread wheat HPT2. Therefore, we predict that the variation of HPT2 retention rates in different wheat populations would be less prominent than that in barley. This prediction would also apply to HPT2 in wild emmer. Gene retention analyses in wild emmer and bread wheat population are hindered by the lack of proper wild emmer WGS data and by the lack of HPT2 in bread wheat reference genome Chinese Spring. It would be interesting for future studies to verify our prediction.

Environmental selection on gene retention is also supported by the detection of positive selection, which has been commonly related to plant-environmental interaction and stress responses [68, 69]. Positive selection was detected for HPT2 and 7 out of the 8 randomly selected HPT2-pattern-like genes, suggesting that environmental selection should be prevalent among genes under gene fixation. Consistently, gene functional enrichment analyses suggest that the HPT2-pattern-like genes are over-represented with plant defense responsive function. A recent study found that a cluster of SSDs genes shared by core Pooideae species coincided with their adaption to cooling conditions [73]. This is consistent with our findings with HPT2-pattern-like genes, which are dominated by SSDs and are shared by barley, wild emmer, and bread wheat. In this study, HPT2-pattern-like genes were identified to be in the middle stage of gene fixation, which are partially retained in the populations. Gene PAV mutations have also been shown to have important impact on phenotypic traits and stress tolerance [54, 74, 75]. Many of the HPT2-pattern-like genes have annotated gene functions that are closely related to plants’ stress resistance, such as those exemplified in Fig. 5: transporter proteins, disease resistance kinase, antioxidant biosynthesis, calcium sensor regulating protein kinase, and oxidoreductase. Given that the HPT2-pattern-like genes were selected based on their gene dosage effect, the enrichment of stress and defense-related genes in HPT2-pattern-like genes also suggests that these genes under stronger environmental selection may more likely display a gene dosage effect. Take HPT2 for example, firstly, the gene dosage (functional redundancy) effect results in the decreasing retention rates of HPT2 in barley (HPT1+HPT2), wild emmer (2HPT1+HPT2), and wheat (3HPT1+HPT2), caused by the relaxed selection constraint as gene dosage increases. Secondly, the association of HPT’s (vitamin E) with environmental adaption adds an additional selection pressure, which would make HPT2 in barley more important, while HPT2 in wheat is less affected due to the presence of 3 HPT1s. Thus, the gene dosage effect on HPT2 retention tends to be amplified. This theory can partly explain the enrichment of plant defense function in the 326 HPT2-pattern-like genes.

Biased gene retention pattern in terms of gene function

We found that HPT2-pattern-like genes are significantly over-represented with molecular-binding and interaction functions. Despite of using different data and approach, our findings are in striking agreement with previous reports that gene duplicates encoding transcription factors and protein kinase are preferentially preserved after WGD in Arabidopsis, rice, and yeast [23, 25, 31]. The biased retention of gene duplicates involved in molecular interaction (various molecular binding such as transcriptional factors) is consistent with previous gene dosage balance hypothesis [28, 31], which states that gene function in a molecular complex displays higher gene dosage effects. Indeed, the HPT2-pattern-like genes were selected being potentially affected by gene dosage constraint. In addition, it has been reported that genes resulted from SSDs were enriched for membrane proteins and functions in stress tolerance (and under-presented for transcription factors and DNA/RNA binding functions), displaying a reciprocal relationship with that observed for WGDs [27, 28]. We found that HPT2-pattern-like genes (mainly SSDs) were enriched for membrane proteins and defense responses functions, similar with the observation for tandem duplications in a previous study [27]. Our findings are corroborated by a recent comparative study in 141 plant genomes, which showed that SSDs such as tandem and proximal duplications have evolved toward biased functional roles in plant self-defense [37]. In this study, we found that HPT2-pattern-like genes are mainly resulted from SSDs, instead of WGDs. Our findings tend to support an inter-twined scenario of previous studies: on the one hand, HPT2-pattern-like genes (mainly SSDs) resemble previous WGDs in enrichment for molecular binding, transcription factors, and protein kinase [23, 25, 31]; on the other hand, HPT2-pattern-like genes are also enriched in membrane and defense responsive proteins, similar with the tandem duplications in previous studies in Arabidopsis and rice [27, 28]. The enrichment of transcription factor and molecular binding proteins in these SSDs may be attributed to HPT2-pattern-like genes being filtered for genes under potential gene dosage effect, which agrees with previous suggestion that genes involved in subunit-subunit interactions tend to be under higher gene dosage constraints [31]. Noteworthy, a recent study [58] supported a recent burst of gene duplications (RBGD) in Triticeae, mainly by transposable element-mediated SSDs. Functional enrichment analyses found that protein dimerization activity, xylan metabolic process, catalytic activity, and nucleobase-containing compound metabolic process were enriched among RBGD genes, which seem to vary from the functional profile for our identified HPT2-pattern-like genes [58]. The enrichment of plant defense responses and molecular binding functions in HPT2-pattern-like may reflect the gene dosage selection effect, which is likely to enhance among partially retained genes related to environmental adaption.

Species-specific selection affecting gene retention

In this study, we found that selection on gene retention follows a species-specific pattern. Although with some variations, both barley and bread wheat preferential genes overlap on protein phosphorylation, defense, and stress responsive functions, whereas the neutrally selected genes displayed no clear enrichment. The detection of overlapped functions in bread wheat preferential genes was not expected, which reflect a different type of selection pressure from gene dosage constraints. We define this unknown selection pressure as species-specific selection. The species-specific selection hypothesis is in agreement with an earlier study, which states that the evolution of gene duplicate depends on its duplication mechanisms (WGD and SSD) but, most importantly, on species [60]. Based on evolutionary dynamics analyses on gene paralogs in 8 plant species, the authors concluded that species-specific selection plays dominant role in determining gene retention [60]. We speculate that the preferentially retained genes in barley may be influenced by combined effects of gene dosage constraint and species-specific selection, which well-explains more genes being preferentially retained in barley. Instead, bread wheat preferential genes may be related to species-specific selection only. The species-specific selection hypothesis may contribute to species-specific gene duplication events in various plant genomes [7, 14, 37]. Indeed, species-specific gene family expansion via duplication, particularly SSDs have been shown to play an important role in plants’ stress responses and disease resistance [76,77,78].

HPT2-pattern-like genes provide new opportunity for crop improvement

Among monocot plants, the Poaceae family contains some of the most important crops such as bread wheat, rice, maize, barley, and oat, which are the major food sources for human being [79]. In these cereal species, genetic engineering and crossing have been used as an important breeding approach to improve grain yield and nutrient content under various stress and changing climate conditions [80, 81]. Here, we propose that the HPT2-pattern-like genes identified in this study represent an interesting list of candidate genes for future crop improvement in cereal plants. Compared to a common list of PAV or copy number variation genes as reported in previous studies [39, 54], the HPT2-pattern-like genes identified in this study were specifically selected for their biased retention in barley, wild emmer, and bread wheat, which reflects a long-term environmental selection effect. Thus, the introduction of HPT2-pattern-like genes via either genetic engineering or crossing may have great potential to improve crop performance under various environmental challenges.

The potential application of HPT2-pattern-like genes for crop improvement can be exemplified by the transgenic overexpression of HvHPT2 in barley. As the major lipid soluble antioxidant synthesized in green plants [82], vitamin E not only plays a crucial role in plant growth [83] and various stress tolerance [84], but also forms an essential component in human diet with numerous nutritional and medical values [85, 86]. A sufficient intake of vitamin E can help to prevent neurological disorders and chronic diseases, such as atherosclerosis, cataracts, and cancers, and maintain the healthy development of various tissues and organs, including brain, nerves, muscle and bone, skin, bone marrow, and blood [85, 86]. Due to the importance of vitamin E for both plants and humans, genetic engineering to improve vitamin E content has become an important target in crop breeding [87]. Flagship crop species like bread wheat [88], barley [43, 89], maize [44], sorghum [90], rice [91,92,93,94], and soybean [95,96,97,98,99,100] have been the focus of vitamin E genetic engineering. In this study, we showed that the overexpression of HvHPT2 can significantly increase α-tocopherol content in leaf and grain, proving HvHPT2 as an effective candidate gene for future genetic engineering and breeding to improve vitamin E content in barley. Future study is necessary to identify superior HvHPT2 allele with higher vitamin E production and cold-responsive transcription, particularly those conserved in Tibetan barley. In addition to barley, HPT2s from T. intermedium, wild emmer, bread wheat, and A. eriantha also deserve attention to explore their potential application for improved vitamin E content and stress tolerance in crop breeding. We speculate that HPT2 should also be present in other Triticeae and Avenideae plants such as S. cereale, T. urartu, T. tauschii, and A. atlantica despite its absence in their reference genomes. Genome sequencing in these species is needed to verify our hypothesis.

As exemplified by HPT2, the potential use of the identified 326 HPT2-pattern-like genes in crop improvement can also be investigated. Special attention should be given to those genes encoding transcription factors and related to plant defense responses. In addition, genes encoding membrane proteins, most of which involve molecular interactions and display higher gene dosage balance effect, also deserves special attention. Indeed, HPT2 itself also functions as a membrane-binding protein, critical for maintaining the integrity of cell membrane under various stresses [82]. In addition, the importance of membrane proteins such as leucine-rich repeat receptor kinase plant defense has been highlighted in many previous studies [101,102,103]. It should be pointed out that the biased gene retention rates for the 326 genes may be attributed to the gene dosage effect but may also result from other factors such as random occurrence. The general gene functional and selection signals were detected by the overall gene enrichment and natural selection analyses, but did not verify that all of these 326 genes resemble the case of HPT2. The value of the identified gene pool is that it provides an enriched pool to identify HPT2-pattern-like genes. The factors affecting the retention rate, function, and evolution pattern of each gene still needs to be examined individually.

Conclusions

Based on comparative gene retention analysis across barley, wild emmer, and bread wheat pangenome lines, we dissected a convincing example of HPT2 caught in the middle stage of gene fixation. We innovatively combined phylogeny-based orthology inference with population-based gene retention analysis, and identified 326 HPT2-pattern-like gene duplicates under gene fixation selection potentially affected by a gene dosage effect. Gene retention, natural selection, and functional enrichment analyses support gene dosage constraint, environmental adaption, and species-specific selection as three potential factors that determine gene retention process following duplication. Genes under stronger environmental selection more likely display a gene dosage effect in term of gene retention. Our analyses prove that population-based gene retention analyses may serve as a novel and effective approach to study gene evolution via duplication. Our findings will greatly improve our understanding of genome evolution in plants and other living organisms.

Methods

Identification of homologous HPT genes in Poales genome

UbiA domain profile (PF01040) was downloaded from Pfam database (http://pfam.xfam.org/). Putative HPT genes were identified using hmmscan in HMMER package (http://hmmer.org/). Annotated protein sequences for barley reference genome Morex (V2) and wild emmer (V2) were downloaded from IPK (https://doi.org/10.5447/IPK/2019/8) and GrainGenes (https://bread wheat.pw.usda.gov/GG3/), respectively. Genomic datasets for other species were retrieved from either Phytozome V12 (https://phytozome-next.jgi.doe.gov/) or EnsemblPlant release-54 (https://plants.ensembl.org/index.html) databases. The presence/absence of HPT2 in wheat pangenome assemblies were validated by tblastn using the amino acid sequence of TraesLAC2B03G00984720.1 from wheat cultivar Lancer. The tblastn results against 15 wheat genomes were accessible at https://plants.ensembl.org/Triticum_aestivum_lancer/Tools/Blast/Ticket?tl=d6dR15Gh5DUEuDRG.

Phylogenetic tree construction

Sequence alignment was performed using Muscle (8 iterations) [104]. Amino acid sequences were used for NJ phylogeny reconstruction in MEGA7.0 [105] using p-distance substitution model. CDS sequences were used for ML tree construction using IQ-TREE (V1.6.12) [106] with MG+F1X4+G4 substitution model (lowest BIC score). Branch support was calculated by 1000 times bootstrapping and SH-like approximate likelihood ratio test. Tree annotation was performed using Figtree software (v1.4.3, http://tree.bio.ed.ac.uk/software/figtree). Calibrated species phylogeny was obtained at http://www.timetree.org/ website. HvHPT2 SNP data for barley pangenome lines (https://doi.org/10.5447/ipk/2020/24) and Tibetan barley lines (https://edal.ipk-gatersleben.de/) were downloaded and used for ML tree construction using IQ-TREE with MGK+F1X4+G4 substitution model.

Gene presence and absence analyses

SNP data for barley pangenome lines (177 SNPs) [39] and Tibetan barley (136 SNPs) [107] was used to screen HPT2 presence/absence based on the missing genotype calls. For wild barley lines at EC in Israel, data at http://146.118.64.11/BarleyVar was searched. For orthology inference in barley and bread wheat pangenomes, Get_Homologues-EST was used following closely to the online manual (http://eead-csic-compbio.github.io/get_homologues/manual-est/). OrthoMCL (OMCL) algorithm was used for clustering with an inflation index of 1.5 as the cuttoff. Morex and Chinese Spring were used as the control for barley and bread wheat, respectively. For wild emmer, pangenome data were downloaded in the NCBI Sequence Read Archive database (accession IDs listed in Fig. 2B). Read adapters were removed using Trimmomatic v.0.36 [108]. Alignment was performed using bwa-mem [109] with default settings and the results were used to call gene deletions using CNVnator v.0.33 [110] based on read mapping depth with non-overlapping windows.

To validate the wheat genome annotation quality, BITACORA genome scan tool [111] was implemented on the OGs containing the selected HPT2-pattern-like genes. The hmmscan tool from the HMMER package (http://hmmer.org/) was used to identify conserved protein domain with E-value threshold of 1e−30. The corresponding hmm protein domain profile for each OGs and their protein sequences were used as input for BITACORA scanning using the same threshold of 1e−30. The results (Additional file 13) showed that no newly predicted gene for the target OGs could be identified, thereby confirming the reliability of the wheat genome annotation data.

Natural selection tests

Natural selection pressure was assessed using codeml in PAML4.7 package [112]. CDS sequence alignment of HPTs was performed using Muscle [104]. Branch pattern specification was implemented using Treeview1.6.6 (http://taxonomy.zoology.gla.ac.uk/rod/treeview.html). P-value for likelihood ratio tests was calculated using Graphpad software (https://www.graphpad.com/quickcalcs/PValue1.cfm). For pair-wise Ka, Ks, and Ka/Ks calculation, the ParaAT tool [113] was used.

RT-PCR analyses

RNA was extracted using EZNA Plant RNA following the manufacturer’s instructions (Omega, Norcross, USA). One microgram RNA was digested with a gDNA Remover kit and reversely transcribed into cDNA in ReverTra Ace qPCR RT Master Mix (Toyobo, Kyoto, Japan) in a 20-μL reaction volume. HvACTIN was used as reference gene. Three biological replicates and three technical replicates were included. All primers involved in RT-PCR analyses are listed in Additional file 1: Table S6.

Vector construction, transformation, GUS staining, and subcellular localization

Morex cDNA of HvHPT2 was subcloned into pH7FWG2.0 vector with 35S promoter. Immature scutella (1.5–2 mm) of barley cultivar “Golden Promise” embryo were used for Agrobacterium-mediated transformation following previously described procedure [114].

ProHvHPT2:GUS containing 2.3-kb HvHPT2 promoter was subcloned into destination vector pHGWFS7 containing Egfp:GUS by LR clonase reaction (Invitrogen, USA). Rice transformation for ProHvHPT2:GUS was performed by Agrobacterium-mediated co-cultivation approach. Regenerated plants were grown in a growth chamber. All primers involved in gene constructs are listed in Additional file 1: Table S6.

35S:HvHGGT:GFP, 35S:HvHPT1:GFP, and 35S:HvHPT2:GFP constructs were generated using Gateway cloning (Invitrogen, USA) and transformed into barley mesophyll protoplasts via PEG-mediated transfection, followed by incubation at 25°C in dark for 14 h before observation under a LSM 710 NLO confocal microscope (Zeiss, Jane, Germany).

T₁ generation of HvHPT2-GUS transgenic lines were incubated in GUS staining solution and vacuum infiltrated for 10min, followed by incubation at 37°C for 2h. After staining, the samples were washed with 75% ethanol to remove chlorophyll and observed under a stereoscope. Original protoplast imaging files were deposited in Additional file 14.

Vitamin E measurement

Vitamin E was extracted according to previously described method [43]. Agilent 1200 HPLC (Agilent Technologies) equipped with Phenomenex Kinetex F5 100A column (2.6 μm, 150 × 4.6 mm; Phenomenex) and G1321A detector was used for tocopherols and tocotrienols quantification, based on external standard curves using authentic compounds (ChromaDex).

Phylogeny-based orthologous group inference

Protein sequences for barley (Morex V2), wild emmer (Zavitan WEWSeq v2.0), and bread wheat (Chinese Spring IWGSC refseqv2.1) were downloaded from https://doi.org/10.5447/IPK/2019/8 and https://bread wheat.pw.usda.gov/GG3/. Genomic data for B. distachyon (V2.1, https://phytozome-next.jgi.doe.gov/) was included as phylogenetic outgroup. Orthologous gene inference was performed using OrthoFinder [51] following the manual at https://github.com/davidemms/OrthoFinder.

Transcriptome analyses

Transcriptome data of six barley tissues: Caryopsis: developing grain 5DPA; inflorescence: 1–1.5 cm; lemma: 42 DPA; palea: 42 DPA; shoot: 10 cm stage; root: 28 DPA, was retrieved from EORNA transcriptome database [115] (https://ics.hutton.ac.uk/eorna/index.html). For each sample, three biological replicates were included. The mean transcripts per million reads (TPM) value for each gene was used for heatmap generation based on log (TPM+0.01), 10).

Gene duplication pattern and synteny analyses

MCScanX [116] was used for gene duplication and synteny analyses. Genome comparison was performed using the standalone NCBI-BLAST-2.2.29 (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) with E-value threshold of 1e−5. The top 5 blastp hits were retained. Gene duplication type was determined using duplicate_gene_classifier program in MCScanX.

Gene functional enrichment and network analyses

Pathway analyses were performed using g:Profiler (https://biit.cs.ut.ee/gprofiler) with barley as reference. To facilitate input for g:GO St, candidate barley genes (Morex, annotation version V2) were mapped to version V1 gene ID using blastp program (E-value 1e−10). Enrichment P values were estimated by hypergeometric distribution tests and adjusted by multiple testing correction using the g:SCS algorithm, with a threshold at 0.05. For gene network analysis, g:Profiler output files were downloaded and used as input for Cytoscape V3.7.

Data plotting and statistical analysis

Geographical map was generated using R packages tidyverse, maps, and mapplots. Two-way t tests and Pearson association were performed using Scipy package in Python 3.7.

Availability of data and materials

The datasets generated and/or analyzed during the current study are available at figshare repository: https://figshare.com/s/16286485ba63e682a400 [39,40,41,42]. The original input and output data files were deposited in GitHub public repository at https://github.com/yongjiam/HPT2_study [39,40,41,42].

Abbreviations

EC :: Evolution Canyon
GO :: Gene Ontology
OG :: Orthologous Group
GRR :: Gene retention rate
R _bw :: GRR_barley / GRR_wheat
HOG :: Hierarchical orthologous group
HGGT :: Homogentisate geranylgeranyl transferase
HPT :: Homogentisate phytyltransferase
LRTs :: Likelihood ratio tests
ML :: Maximum likelihood
NJ :: Neighbor joining
Ka :: Non-synonymous substitutions
NFS :: North-facing slope
PAV :: Presence/absence variation
RBGDs :: Recent burst of gene duplications
SNP:: Single-nucleotide polymorphism
SSDs :: Small-scale duplications
SFS :: South-facing slope
Ks :: Synonymous substitutions
TPM :: Transcripts per million reads
WGDs :: Whole genome duplications

References

Kaessmann H. Origins, evolution, and phenotypic impact of new genes. Genome Res. 2010;20(10):1313–26.
Article CAS Google Scholar
Ding Y, Zhou Q, Wang W. Origins of new genes and evolution of their novel functions. Annu Rev Ecol Evol Syst. 2012;43:345–63.
Article Google Scholar
Long MY, VanKuren NW, Chen SD, Vibranovski MD. New gene evolution: little did we know. Annu Rev Genet. 2013;47:307–33.
Article CAS Google Scholar
Zhang JZ. Evolution by gene duplication: an update. Trends Ecol Evol. 2003;18(6):292–8.
Article Google Scholar
Kuzmin E, Taylor JS, Boone C. Retention of duplicated genes in evolution. Trends Genet. 2022;38(1):59–72.
Article CAS Google Scholar
Magadum S, Banerjee U, Murugan P, Gangapur D, Ravikesavan R. Gene duplication as a major force in evolution. J Genet. 2013;92(1):155–61.
Article Google Scholar
Panchy N, Lehti-Shiu M, Shiu SH. Evolution of gene duplication in plants. Plant Physiol. 2016;171(4):2294–316.
Article CAS Google Scholar
De Bodt S, Maere S, Van de Peer Y. Genome duplication and the origin of angiosperms. Trends Ecol Evol. 2005;20(11):591–7.
Article Google Scholar
Kondrashov FA. Gene duplication as a mechanism of genomic adaptation to a changing environment. P Roy Soc B-Biol Sci. 2012;279(1749):5048–57.
Google Scholar
Li WH, Gu ZL, Wang HD, Nekrutenko A. Evolutionary analyses of the human genome. Nature. 2001;409(6822):847–9.
Article CAS Google Scholar
Rubin GM, Yandell MD, Wortman JR, Miklos GLG, Nelson CR, Hariharan IK, et al. Comparative genomics of the eukaryotes. Science. 2000;287(5461):2204–15.
Article CAS Google Scholar
Kaul S, Koo HL, Jenkins J, Rizzo M, Rooney T, Tallon LJ, et al. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408(6814):796–815.
Article CAS Google Scholar
Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, et al. The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science. 2008;319(5859):64–9.
Article CAS Google Scholar
Flagel LE, Wendel JF. Gene duplication and evolutionary novelty in plants. New Phytol. 2009;183(3):557–64.
Article Google Scholar
Albalat R, Canestro C. Evolution by gene loss. Nat Rev Genet. 2016;17(7):379–91.
Article CAS Google Scholar
Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J. Preservation of duplicate genes by complementary, degenerative mutations. Genetics. 1999;151(4):1531–45.
Article CAS Google Scholar
Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000;290(5494):1151–5.
Article CAS Google Scholar
Wagner A. The fate of duplicated genes: loss or new function? Bioessays. 1998;20(10):785–8.
Article CAS Google Scholar
Walsh JB. How often do duplicated genes evolve new functions. Genetics. 1995;139(1):421–8.
Article CAS Google Scholar
Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, et al. Large-scale copy number polymorphism in the human genome. Science. 2004;305(5683):525–8.
Article CAS Google Scholar
Itsara A, Wu H, Smith JD, Nickerson DA, Romieu I, London SJ, et al. De novo rates and selection of large copy number variation. Genome Res. 2010;20(11):1469–81.
Article CAS Google Scholar
Lynch M, Conery JS. The origins of genome complexity. Science. 2003;302(5649):1401–4.
Article CAS Google Scholar
Blanc G, Wolfe KH. Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell. 2004;16(7):1679–91.
Article CAS Google Scholar
Seoighe C, Gehring C. Genome duplication led to highly selective expansion of the Arabidopsis thaliana proteome. Trends Genet. 2004;20(10):461–4.
Article CAS Google Scholar
Paterson AH, Chapman BA, Kissinger JC, Bowers JE, Feltus FA, Estill JC. Many gene and domain families have convergent fates following independent whole-genome duplication events in Arabidopsis, Oryza, saccharomyces and Tetraodon. Trends Genet. 2006;22(11):597–602.
Article CAS Google Scholar
Freeling M. The evolutionary position of subfunctionalization, downgraded. Genome Dyn. 2008;4:25–40.
Article CAS Google Scholar
Rizzon C, Ponger L, Gaut BS. Striking similarities in the genomic distribution of tandemly arrayed genes in Arabidopsis and rice. PLoS Comput Biol. 2006;2(9):989–1000.
Article CAS Google Scholar
Freeling M. Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition. Annu Rev Plant Biol. 2009;60:433–53.
Article CAS Google Scholar
Cusack SA, Wang PP, Lotreck SG, Moore BM, Meng FR, Conner JK, et al. Predictive models of genetic redundancy in Arabidopsis thaliana. BMC Ecol Evol. 2021;38(8):3397–414.
CAS Google Scholar
Rodin SN, Riggs AD. Epigenetic silencing may aid evolution by gene duplication. J Mol Evol. 2003;56(6):718–29.
Article CAS Google Scholar
Thomas BC, Pedersen B, Freeling M. Following tetraploidy in an Arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes. Genome Res. 2006;16(7):934–46.
Article CAS Google Scholar
Liang ZK, Schnable JC. Functional divergence between subgenomes and gene pairs after whole genome duplications. Mol Plant. 2018;11(3):388–97.
Article CAS Google Scholar
Innan H, Kondrashov F. The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet. 2010;11(2):97–108.
Article CAS Google Scholar
Kuzmin E. Exploring whole-genome duplicate gene retentionwith complex genetic interaction analysis. Science. 2020;368(6498):eaaz5667.
Article CAS Google Scholar
Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, et al. Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci U S A. 2005;102(15):5454–9.
Article CAS Google Scholar
Papp B, Pal C, Hurst LD. Dosage sensitivity and the evolution of gene families in yeast. Nature. 2003;424(6945):194–7.
Article CAS Google Scholar
Qiao X, Li QH, Yin H, Qi KJ, Li LT, Wang RZ, et al. Gene duplication and evolution in recurring polyploidization-diploidization cycles in plants. Genome Biol. 2019;20(1):38.
Article Google Scholar
Jiao YN, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, et al. Ancestral polyploidy in seed plants and angiosperms. Nature. 2011;473(7345):97–100.
Article CAS Google Scholar
Jayakodi M, Padmarasu S, Haberer G, Bonthala VS, Gundlach H, Monat C, et al. The barley pan-genome reveals the hidden legacy of mutation breeding. Nature. 2020;588:284–9.
Article CAS Google Scholar
Avni R, Nave M, Barad O, Baruch K, Twardziok SO, Gundlach H, et al. Wild emmer genome architecture and diversity elucidate wheat evolution and domestication. Science. 2017;357(6346):93–6.
Article CAS Google Scholar
Cheng H, Liu J, Wen J, Nie XJ, Xu LH, Chen NB, et al. Frequent intra- and inter-species introgression shapes the landscape of genetic variation in bread wheat. Genome Biol. 2019;20:136.
Article Google Scholar
Walkowiak S, Gao LL, Monat C, Haberer G, Kassa MT, Brinton J, et al. Multiple wheat genomes reveal global variation in modern breeding. Nature. 2020;588(7837):277–83.
Article CAS Google Scholar
Zeng Z, Han N, Liu C, Buerte B, Zhou C, Chen J, et al. Functional dissection of HGGT and HPT in barley vitamin E biosynthesis via CRISPR/Cas9-enabled genome editing. Ann Bot. 2020;126(5):929–42.
Article CAS Google Scholar
Cahoon EB, Hall SE, Ripp KG, Ganzke TS, Hitz WD, Coughlan SJ. Metabolic redesign of vitamin E biosynthesis in plants for tocotrienol production and increased antioxidant content. Nat Biotechnol. 2003;21(9):1082–7.
Article CAS Google Scholar
Wang D, Wang Y, Long W, Niu M, Zhao Z, Teng X, et al. SGD1, a key enzyme in tocopherol biosynthesis, is essential for plant development and cold tolerance in rice. Plant Sci. 2017;260:90–100.
Article CAS Google Scholar
Nevo E. "evolution canyon", a potential microscale monitor of global warming across life. Proc Natl Acad Sci U S A. 2012;109(8):2960–5.
Article CAS Google Scholar
Yin HY, Ben-Abu Y, Wang HW, Li AF, Nevo E, Kong LR. Natural selection causes adaptive genetic resistance in wild emmer wheat against powdery mildew at "evolution canyon" microsite. Mt Carmel, Israel PLoS One. 2015;10(4):e0122344.
Article Google Scholar
Hubner S, Rashkovetsky E, Kim YB, Oh JH, Michalak K, Weiner D, et al. Genome differentiation of Drosophila melanogaster from a microclimate contrast in evolution canyon. Israel Proc Natl Acad Sci USA. 2013;110(52):21059–64.
Article CAS Google Scholar
Nevo E. Evolution in action: adaptation and incipient sympatric speciation with gene flow across life at "evolution canyon". Israel Isr J Ecol Evol. 2014;60(2-4):85–98.
Article Google Scholar
Maeda H, Song W, Sage TL, DellaPenna D. Tocopherols play a crucial role in low-temperature adaptation and phloem loading in Arabidopsis. Plant Cell. 2006;18(10):2710–32.
Article CAS Google Scholar
Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20(1):238.
Article Google Scholar
Aury JM, Jaillon O, Duret L, Noel B, Jubin C, Porcel BM, et al. Global trends of whole-genome duplications revealed by the ciliate paramecium tetraurelia. Nature. 2006;444(7116):171–8.
Article CAS Google Scholar
Conant GC, Birchler JA, Pires JC. Dosage, duplication, and diploidization: clarifying the interplay of multiple models for duplicate gene evolution over time. Curr Opin Plant Biol. 2014;19:91–8.
Article CAS Google Scholar
Munoz-Amatriain M, Eichten SR, Wicker T, Richmond TA, Mascher M, Steuernagel B, et al. Distribution, functional impact, and origin mechanisms of copy number variation in the barley genome. Genome Biol. 2013;14(6):R58.
Article Google Scholar
Hughes T, Ekman D, Ardawatia H, Elofsson A, Liberles DA. Evaluating dosage compensation as a cause of duplicate gene retention in paramecium tetraurelia. Genome Biol. 2007;8(5):213.
Article Google Scholar
Schnable JC, Pedersen BS, Subramaniam S, Freeling M. Dose-sensitivity, conserved non-coding sequences, and duplicate gene retention through multiple tetraploidies in the grasses. Front Plant Sci. 2011;2:2.
Article CAS Google Scholar
Chen YM, Song WJ, Xie XM, Wang ZH, Guan PF, Peng HR, et al. A collinearity-incorporating homology inference strategy for connecting emerging assemblies in the Triticeae tribe as a pilot practice in the plant pangenomic era. Mol Plant. 2020;13(12):1694–708.
Article CAS Google Scholar
Wang XL, Yan XQ, Hu YH, Qin LY, Wang DW, Jia JZ, et al. A recent burst of gene duplications in Triticeae. Plant Commun. 2022;3(2):100268.
Article CAS Google Scholar
Scannell DR, Wolfe KH. A burst of protein sequence evolution and a prolonged period of asymmetric evolution follow gene duplication in yeast. Genome Res. 2008;18(1):137–47.
Article CAS Google Scholar
Carretero-Paulet L, Fares MA. Evolutionary dynamics and functional specialization of plant paralogs formed by whole and small-scale genome duplications. Mol Biol Evol. 2012;29(11):3541–51.
Article CAS Google Scholar
Hunt BG, Ometto L, Wurm Y, Shoemaker D, Yi SV, Keller L, et al. Relaxed selection is a precursor to the evolution of phenotypic plasticity. Proc Natl Acad Sci U S A. 2011;108(38):15936–41.
Article CAS Google Scholar
Cui RF, Medeiros T, Willemsen D, Iasi LNM, Collier GE, Graef M, et al. Relaxed selection limits lifespan by increasing mutation load. Cell. 2019;178(2):385–99.
Article CAS Google Scholar
Wicke S, Muller KF, dePamphilis CW, Quandt D, Bellot S, Schneeweiss GM. Mechanistic model of evolutionary rate variation en route to a nonphotosynthetic lifestyle in plants. Proc Natl Acad Sci U S A. 2016;113(32):9045–50.
Article CAS Google Scholar
Kinjo Y. Enhanced mutation rate, relaxed selection, and the “domino effect” are associated with gene loss in Blattabacterium, a cockroach endosymbiont. Mol Biol Evol. 2021;38(9):3820–31.
Article CAS Google Scholar
Jiang XY, Assis R. Rapid functional divergence after small-scale gene duplication in grasses. BMC Evol Biol. 2019;19(1):97.
Article Google Scholar
Wang YP, Wang XY, Paterson AH. Genome and gene duplications and gene expression divergence: a view from plants. Ann N Y Acad Sci. 2012;1256:1–14.
Article Google Scholar
Mathers TC, Chen YZ, Kaithakottil G, Legeai F, Mugford ST, Baa-Puyoulet P, et al. Rapid transcriptional plasticity of duplicated gene clusters enables a clonally reproducing aphid to colonise diverse plant species. Genome Biol. 2017;18(1):27.
Article Google Scholar
Jia Y, Li B, Zhang YJ, Zhang XQ, Xu YH, Li CD. Evolutionary dynamic analyses on monocot flavonoid 3 '-hydroxylase gene family reveal evidence of plant-environment interaction. BMC Plant Biol. 2019;19(1):347.
Article Google Scholar
Jia Y, Selva C, Zhang YJ, Li B, McFawn LA, Broughton S, et al. Uncovering the evolutionary origin of blue anthocyanins in cereal grains. Plant J. 2020;101(5):1057–74.
Article CAS Google Scholar
Buffalo V, Coop G. Estimating the genome-wide contribution of selection to temporal allele frequency change. Proc Natl Acad Sci U S A. 2020;117(34):20672–80.
Article CAS Google Scholar
De la Torre AR, Wilhite B, Neale DB. Environmental genome-wide association reveals climate adaptation is shaped by subtle to moderate allele frequency shifts in loblolly pine. Genome Biol Evol. 2019;11(10):2976–89.
Article Google Scholar
Ergon A, Skot L, Saether VE, Rognli OA. Allele frequency changes provide evidence for selection and identification of candidate loci for survival in red clover (Trifolium pratense L.). front. Plant Sci. 2019;10:718.
Google Scholar
Zhang L, Zhu XX, Zhao YY, Guo J, Zhang TK, Huang WC, et al. Phylotranscriptomics resolves the phylogeny of pooideae and uncovers factors for their adaptive evolution. Mol Biol Evol. 2022;39(2):msac026.
Article CAS Google Scholar
Weisweiler M, de Montaigu A, Ries D, Pfeifer M, Stich B. Transcriptomic and presence/absence variation in the barley genome assessed from multi-tissue mRNA sequencing and their power to predict phenotypic traits. BMC Genomics. 2019;20(1):787.
Article Google Scholar
Hu HF, Scheben A, Verpaalen B, Tirnaz S, Bayer PE, Hodel RGJ, et al. Amborella gene presence/absence variation is associated with abiotic stress responses that may contribute to environmental adaptation. New Phytol. 2022;233(4):1548–55.
Article Google Scholar
Yang SH, Gu TT, Pan CY, Feng ZM, Ding J, Hang YY, et al. Genetic variation of NBS-LRR class resistance genes in rice lines. Theor Appl Genet. 2008;116(2):165–77.
Article CAS Google Scholar
Hanada K, Zou C, Lehti-Shiu MD, Shinozaki K, Shiu SH. Importance of lineage-specific expansion of plant tandem duplicates in the adaptive response to environmental stimuli. Plant Physiol. 2008;148(2):993–1003.
Article CAS Google Scholar
Hardigan MA, Crisovan E, Hamilton JP, Kim J, Laimbeer P, Leisner CP, et al. Genome reduction uncovers a large dispensable genome and adaptive role for copy number variation in asexually propagated solanum tuberosum. Plant Cell. 2016;28(2):388–405.
Article CAS Google Scholar
Hodkinson TR. Evolution and taxonomy of the grasses (Poaceae): a model family for the study of species-rich groups. Ann Plant Rev. 2018;1(1):255–93.
Article Google Scholar
Mittler R, Blumwald E. Genetic engineering for modern agriculture: challenges and perspectives. Annu Rev Plant Biol. 2010;61:443–62.
Article CAS Google Scholar
Varshney RK, Bansal KC, Aggarwal PK, Datta SK, Craufurd PQ. Agricultural biotechnology for crop improvement in a variable climate: hope or hype? Trends Plant Sci. 2011;16(7):363–71.
Article CAS Google Scholar
Mene-Saffrane L. Vitamin E biosynthesis and its regulation in plants. Antioxidants. 2018;7(1):2.
Article Google Scholar
Sattler SE, Gilliland LU, Magallanes-Lundback M, Pollard M, DellaPenna D. Vitamin E is essential for seed longevity, and for preventing lipid peroxidation during germination. Plant Cell. 2004;16(6):1419–32.
Article CAS Google Scholar
Čamagajevac IŠ, Pfeiffer TŽ, Maronić DŠ. Abiotic stress response in plants: the relevance of tocopherols. In: Dharmendra KG, José MP, Francisco JC, editors. Antioxidants and antioxidant enzymes in higher plants: Springer; 2018. p. 233–51.
Chapter Google Scholar
Niki E, Abe K. Vitamin E: structure, properties and functions. In: Niki E, Abe K, editors. Vitamin E: chemistry and nutritional benefits: Royal Society of Chemistry; 2019. p. 1–11.
Chapter Google Scholar
Galli F, Azzi A, Birringer M, Cook-Mills JM, Eggersdorfer M, Frank J, et al. Vitamin E: emerging aspects and new directions. Free Radical Bio Med. 2017;102:16–36.
Article CAS Google Scholar
Mene-Saffrane L, Pellaud S. Current strategies for vitamin E biofortification of crops. Curr Opin Biotechnol. 2017;44:189–97.
Article CAS Google Scholar
Chaudhary N, Khurana P. Cloning, functional characterisation and transgenic manipulation of vitamin E biosynthesis genes of wheat. Funct Plant Biol. 2013;40(11):1129–36.
Article CAS Google Scholar
Chen J, Liu C, Shi B, Chai Y, Han N, Zhu M, et al. Overexpression of HvHGGT enhances tocotrienol levels and antioxidant activity in barley. J Agric Food Chem. 2017;65(25):5181–7.
Article CAS Google Scholar
Che P, Zhao ZY, Glassman K, Dolde D, Hu TX, Jones TJ, et al. Elevated vitamin E content improves all-trans beta-carotene accumulation and stability in biofortified sorghum. Proc Natl Acad Sci U S A. 2016;113(39):11040–5.
Article CAS Google Scholar
Ouyang SQ, He SJ, Liu P, Zhang WK, Zhang JS, Chen SY. The role of tocopherol cyclase in salt stress tolerance of rice (Oryza sativa). Sci China Life Sci. 2011;54(2):181–8.
Article CAS Google Scholar
Farre G, Sudhakar D, Naqvi S, Sandmann G, Christou P, Capell T, et al. Transgenic rice grains expressing a heterologous ρ-hydroxyphenylpyruvate dioxygenase shift tocopherol synthesis from the γ to the α isoform without increasing absolute tocopherol levels. Transgenic Res. 2012;21(5):1093–7.
Article CAS Google Scholar
Chen D, Chen H, Zhang L, Shi X, Chen X. Tocopherol-deficient rice plants display increased sensitivity to photooxidative stress. Planta. 2014;239(6):1351–62.
Article CAS Google Scholar
Wang XQ, Yoon MY, He Q, Kim TS, Tong W, Choi BW, et al. Natural variations in OsγTMT contribute to diversity of the α-tocopherol content in rice. Mol Gen Genomics. 2015;290(6):2121–35.
Article CAS Google Scholar
Van Eenennaam AL, Lincoln K, Durrett TP, Valentin HE, Shewmaker CK, Thorne GM, et al. Engineering vitamin E content: from Arabidopsis mutant to soy oil. Plant Cell. 2003;15(12):3007–19.
Article Google Scholar
Tavva VS, Kim YH, Kagan IA, Dinkins RD, Kim KH, Collins GB. Increased α-tocopherol content in soybean seed overexpressing the Perilla frutescens γ-tocopherol methyltransferase gene. Plant Cell Rep. 2007;26(1):61–70.
Article CAS Google Scholar
Kim YH, Lee YY, Kim YH, Choi MS, Jeong KH, Lee SK, et al. Antioxidant activity and inhibition of lipid peroxidation in germinating seeds of transgenic soybean expressing OsHGGT. J Agric Food Chem. 2011;59(2):584–91.
Article CAS Google Scholar
Chen DF, Zhang M, Wang YQ, Chen XW. Expression of γ-tocopherol methyltransferase gene from Brassica napus increased alpha-tocopherol content in soybean seed. Biol Plant. 2012;56(1):131–4.
Article CAS Google Scholar
Arun M, Subramanyam K, Theboral J, Sivanandhan G, Rajesh M, Dev GK, et al. Transfer and targeted overexpression of γ-tocopherol methyltransferase (γ-TMT) gene using seed-specific promoter improves tocopherol composition in indian soybean cultivars. Appl Biochem Biotechnol. 2014;172(4):1763–76.
Article CAS Google Scholar
Konda AR, Nazarenus TJ, Nguyen H, Yang JS, Gelli M, Swenson S, et al. Metabolic engineering of soybean seeds for enhanced vitamin E tocochromanol content and effects on oil antioxidant properties in polyunsaturated fatty acid-rich germplasm. Metab Eng. 2020;57:63–73.
Article CAS Google Scholar
Zipfel C. Pattern-recognition receptors in plant innate immunity. Curr Opin Immunol. 2008;20(1):10–6.
Article CAS Google Scholar
Halter T, Imkampe J, Mazzotta S, Wierzba M, Postel S, Bucherl C, et al. The leucine-rich repeat receptor kinase BIR2 is a negative regulator of BAK1 in plant immunity. Curr Biol. 2014;24(2):134–43.
Article CAS Google Scholar
Dufayard JF, Bettembourg M, Fischer I, Droc G, Guiderdoni E, Perin C, et al. New insights on leucine-rich repeats receptor-like kinase orthologous relationships in angiosperms. Front Plant Sci. 2017;8:381.
Google Scholar
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–7.
Article CAS Google Scholar
Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33(7):1870–4.
Article CAS Google Scholar
Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268–74.
Article CAS Google Scholar
Guo Y. (2018-09-25): Variation data for a global diversity panel of barley with a focus on Tibetan barley (Qingke). https://doi.org/10.5447/ipk/2018/15.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.
Article CAS Google Scholar
Li H, Durbin R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics. 2009;25(14):1754–60.
Article CAS Google Scholar
Abyzov A, Urban AE, Snyder M, Gerstein M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011;21(6):974–84.
Article CAS Google Scholar
Vizueta J, Sanchez-Gracia A, Rozas J. Bitacora: a comprehensive tool for the identification and annotation of gene families in genome assemblies. Mol Ecol Resour. 2020;20(5):1445–52.
Article CAS Google Scholar
Yang ZH. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586–91.
Article CAS Google Scholar
Zhang Z, Xiao JF, Wu JY, Zhang HY, Liu GM, Wang XM, et al. ParaAT: a parallel tool for constructing multiple protein-coding DNA alignments. Biochem Biophys Res Commun. 2012;419(4):779–81.
Article CAS Google Scholar
Harwood WA. A protocol for high-throughput agrobacterium-mediated barley transformation. Methods Mol Biol. 2014;1099:251–60.
Article CAS Google Scholar
Milne L, Bayer M, Rapazote-Flores P, Mayer CD, Waugh R, Simpson CG. EORNA, a barley gene and transcript abundance database. Sci Data. 2021;8(1):90.
Article CAS Google Scholar
Wang YP, Tang HB, DeBarry JD, Tan X, Li JP, Wang XY, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40(7):e49.
Article CAS Google Scholar

Download references

Acknowledgements

We thank Xinhang Jiang (College of Life Sciences, Zhejiang University, China) for his technical assistance with the HPLC analyses of vitamin E and Dr. Karl Gruber for critical revision of the manuscript.

Funding

This work was supported by the Natural Science Foundation of Zhejiang Province (LQ23C130003), National Science Foundation of China (31971932), China Agriculture Research System of MOF and MARA (CARS-05-05A), the Chinese Academy of Engineering Academy-Locality Cooperation Project (2019-ZJ-JS-02), Key Research Foundation of Science and Technology Department of Zhejiang Province of China (2021C02064-3), Hangzhou Scientific and Technological Program (20211231Y088), and Australia Grain Research and Development Corporation (9176507).

Author information

Yong Jia and Mingrui Xu contributed equally to the study.

Authors and Affiliations

Western Crop Genetic Alliance, College of Science, Health, Engineering and Education, Murdoch University, 90 South Street, Murdoch, WA, 6150, Australia
Yong Jia, Haifei Hu, Brett Chapman & Chengdao Li
Western Australian State Agricultural Biotechnology Centre, Murdoch University, 90 South Street, Murdoch, WA, 6150, Australia
Yong Jia, Haifei Hu, Brett Chapman, Calum Watt & Chengdao Li
College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, 311121, China
Mingrui Xu & Zhanghui Zeng
Intergrain Pty Ltd, Bibra Lake, WA, 6163, Australia
Calum Watt
Institute of Genetic and Regenerative Biology, Key Laboratory for Cell and Gene Engineering of Zhejiang Province, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
B. Buerte, Ning Han, Muyuan Zhu, Hongwu Bian & Zhanghui Zeng
Department of Primary Industries and Regional Development, 3-Baron-Hay Court, South Perth, WA, 6151, Australia
Chengdao Li
Zhejiang Provincial Key Laboratory for Genetic Improvement and Quality Control of Medicinal Plants, Hangzhou, 311121, China
Zhanghui Zeng

Authors

Yong Jia
View author publications
You can also search for this author in PubMed Google Scholar
Mingrui Xu
View author publications
You can also search for this author in PubMed Google Scholar
Haifei Hu
View author publications
You can also search for this author in PubMed Google Scholar
Brett Chapman
View author publications
You can also search for this author in PubMed Google Scholar
Calum Watt
View author publications
You can also search for this author in PubMed Google Scholar
B. Buerte
View author publications
You can also search for this author in PubMed Google Scholar
Ning Han
View author publications
You can also search for this author in PubMed Google Scholar
Muyuan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Hongwu Bian
View author publications
You can also search for this author in PubMed Google Scholar
Chengdao Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhanghui Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

ZZ, HB, CL, and MZ supervised the study. YJ and ZZ conceived the study and wrote the manuscript. ZZ, BB, NH, and MX performed gene cloning, transformation, and analytical experiments. YJ performed phylogenetic, natural selection, duplication type, OrthoFinder, gene retention, and enrichment analyses. HH, BC, and CW performed read mapping, orthology inference, and geographic mapping. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Hongwu Bian, Chengdao Li or Zhanghui Zeng.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Fig. S1.

Simplified model of the vitamin E biosynthesis pathway in barley. Table S1. 62 HPT homologous genes in 22 Poales species. Table S2. Amino acid sequence similarity comparison of HvHPT1 and HvHPT2 with other monocot HPTs. Table S3. Genotyping results of HPT2 in 113 bread wheat accessions. Table S4. The content of tocochromanol isomers in T1 leaves and T2 grains of transgenic lines. Table S5. Conserved elements in the promoter of HvHPT1 and HvHPT2. Table S6. Primers for sequence amplification and genotyping.

Additional file 2.

Sequence alignment and phylogenetic tree files for Fig. 1A, B, and Fig. 2F.

Additional file 3

Synteny alignment for HPT1 and HPT2 across 7 Poales species.

Additional file 4.

SNP data for HPT2 in 300 barley pangenome lines and 109 Tibetan barley lines.

Additional file 5.

PAML input and output files used for natural selection analysis in Table 1.

Additional file 6.

Phylogeny-based orthologous gene inference across barley, wild emmer, and bread wheat and calculated gene retention rate for each orthologous gene group (hierarchical orthologous gene group at node N1 in Fig. 4A).

Additional file 7.

Orthologous gene matrix (presence/absence) in barley, wild emmer, and bread wheat pangenome lines.

Additional file 8.

List of the identified 326 HPT2-pattern-like genes (using barley as reference) and their corresponding gene retention rates in barley, wild emmer, and bread wheat.

Additional file 9.

Gene duplication type data in Fig. 4C (326 HPT2-pattern-like genes) and Fig. 6B (three categories of genes).

Additional file 10.

Lists of identified genes and gene annotations used for functional enrichment analyses.

Additional file 11.

Ka, Ks, and Ka/Ks calculation results for HPT2-pattern-like genes and their parental genes in barley, wild emmer, and bread wheat.

Additional file 12.

Sequence alignment, phylogenetic trees, and natural selection analyses data for Fig. 5E-O.

Additional file 13.

Gene annotation validation results of selected OGs genes in 15 wheat genome assemblies.

Additional file 14.

Original protoplast imaging files of barley.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Jia, Y., Xu, M., Hu, H. et al. Comparative gene retention analysis in barley, wild emmer, and bread wheat pangenome lines reveals factors affecting gene retention following gene duplication. BMC Biol 21, 25 (2023). https://doi.org/10.1186/s12915-022-01503-z

Download citation

Received: 27 March 2022
Accepted: 16 December 2022
Published: 06 February 2023
DOI: https://doi.org/10.1186/s12915-022-01503-z

Comparative gene retention analysis in barley, wild emmer, and bread wheat pangenome lines reveals factors affecting gene retention following gene duplication

Abstract

Background

Results

Conclusions

Background

Results

Identification of a divergent lineage of homogentisate phytyltransferase for vitamin E biosynthesis in Triticeae and Aveninae

HPT2 resulted from dispersed gene duplication and is partially retained in the pangenomes of barley, wild emmer, and bread wheat

HPT2 underwent an interplay of purifying, relaxed, and positive selection

HvHPT2 shares functional redundancy with HvHPT1 in vitamin E biosynthesis

Comparative gene retention analyses revealed a gene dosage constraint effect

Genome-wide identification of HPT2-pattern-like genes and functional analyses

HPT2-pattern-like genes displays relaxed selection and are under varying degrees of positive selection

Comparative gene retention analyses in barley and bread wheat reveals a species-specific selection pattern

Discussion

Population-based gene retention analysis captures gene evolution process via duplication and reveals the effect of gene dosage constraint on gene retention

Environmental selection plays a clear role on gene retention following duplication

Biased gene retention pattern in terms of gene function

Species-specific selection affecting gene retention

HPT2-pattern-like genes provide new opportunity for crop improvement

Conclusions

Methods

Identification of homologous HPT genes in Poales genome

Phylogenetic tree construction

Gene presence and absence analyses

Natural selection tests

RT-PCR analyses

Vector construction, transformation, GUS staining, and subcellular localization

Vitamin E measurement

Phylogeny-based orthologous group inference

Transcriptome analyses

Gene duplication pattern and synteny analyses

Gene functional enrichment and network analyses

Data plotting and statistical analysis

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Biology

Contact us