A high frequency of ancient mitochondrial gene transfer to the nucleus was found in both Conifer II and Gnetales
Based on analyses of both DNA and cDNA sequences generated from high-throughput sequencing, we investigated the variation of mitochondrial gene contents and the fate of missing mitochondrial genes by sampling representative species from all families of gymnosperms. Although a few fast-evolving mitochondrial genes might be difficult to identify from the draft genomes, our results should be reliable when compared with the published mitogenomes of gymnosperms and the study of Guo et al. [24]. Similar to previous studies, all 41 mitochondrial protein-coding genes were found in cycads (Cycas and Zamia), Ginkgo and Pinaceae (Abies, Cedrus, Pinus and Picea), whereas many were not found in the mitogenomes of Conifer II or Gnetales [24,25,26, 30,31,32]. Notably, we found that gene transfer was common but that gene loss was rare in Conifer II, whereas both gene transfer and loss commonly occurred in Gnetales (Fig. 1a). For example, in the ancestor of Conifer II, seven mitochondrial genes (rpl2, rps1, rps2, rps7, rps10, rps11, and rps14) were transferred to the nucleus, but only rpl10 was lost in the descendants. In contrast, six genes were lost and another six genes were transferred to the nuclear genome in the ancestor of Gnetales, followed by the additional loss of six genes (ccmB, matR, mttB, rpl10, rps3, and rps4) and intracellular transfer of one gene (rps12) in Ephedra (Fig. 1a).
Interestingly, five of the genes that have been transferred to the nuclear genome are shared between Conifer II and gnetophytes, including rps1, rps2, rps10, rps11, and rps14. This phenomenon, together with the fact that the phylogenetic tree inferred from these five genes suggests a sister relationship between Conifer II and gnetophytes (Additional file 8: Figure S3), seems to support the Gnecup hypothesis. However, the nucleotide substitution rates of transferred genes in Conifer II and gnetophytes are much higher than those of their homologs in cycads, ginkgo, and Pinaceae, which may have led to long-branch attraction in phylogenetic reconstruction [40, 41]. Although the evolutionary rates of putative transferred genes were still lower than that of the LEAFY gene, it could be because these genes had evolved under functional constraints from mitochondria after transfer to the nucleus. In addition, the Gnepine hypothesis is highly supported by the phylogenetic tree inferred from the mitochondrial protein-coding genes. Although One Thousand Plant Transcriptomes Initiative [42] reported that the placement of Gnetales conflicted among the ASTRAL, supermatrix, and plastome-based trees and both Gnecup and Gnepine hypotheses were supported by the calculation of gene-tree quartet frequencies; Ran et al. [28] reconstructed a robust phylogeny of seed plants based on 1308 nuclear genes, supporting the Gnepine hypothesis. Therefore, it is very likely that the five genes were transferred to the nuclear genome in the ancestors of Conifer II and Gnetales, respectively. This inference is also supported by the ancestral state reconstruction of gene transfer/loss events (Additional file 10: Figure S5). It is interesting that the presequence of sdh3 encodes a mitochondrial chaperonin heat shock protein in Cupressaceae and Cephalotaxaceae, supporting the occurrence of an ancient gene transfer event (Additional file 5: Table S4). However, the presequences of most putative transferred genes are diverse, which could have resulted from separate gene activations or from extensive recombination events in different lineages after a single ancient gene activation following transfer to the nucleus [15].
Our analysis of mitochondrial gene content variation indicates that the basal groups in both gymnosperms and angiosperms encode almost the complete set of mitochondrial protein-coding genes, similar to the common ancestor of seed plants [25, 30, 33, 43], and genes encoding small and large subunit ribosomal proteins and succinic acid dehydrogenase are more prone to be transferred/lost (Fig. 1a) [2,3,4, 8, 9, 20, 21]. However, during the evolution of angiosperms, gene transfer/loss events generally occurred in a genus or even in a species except for a few genes [2, 3, 8, 9, 12, 20, 21], whereas in gymnosperms, except for Ephedra, most of the mitochondrial gene transfer/loss events occurred in the common ancestors of Conifer II and Gnetales, respectively (Fig. 1a).
The two-step transfer mechanism may be the method of mitochondrial gene transfer in Conifer II and Gnetales
During plant evolution, the phenomenon of mitochondrial gene transfer to the nuclear genome is very common, but the mechanism of intracellular gene transfer is still controversial [13, 44,45,46]. Three main mechanisms were proposed for intracellular gene transfer in plants: direct DNA-mediated, direct RNA-mediated, and two-step transfer mechanisms (retroprocessing and subsequent DNA-mediated gene transfer) [13, 45]. If these DNAs were directly transferred from organellar DNA to the nuclear genome, then gene introns with the same phases and positions and RNA editing sites similar to their mitochondrial homologs can be found. Theoretically, the existence of RNA editing sites and group II introns in mitochondrial genes would impede the expression of transferred genes. In this study, the transferred rpl2 gene lost the mitochondrial intron in all taxa of Conifer II (Fig. 2). In addition, in the transferred rps1, rps2, rps10, rps11, rps14 and rpl2 genes, most RNA editing sites found in cycads, Ginkgo, and Pinaceae were converted from C to T in Conifer II and Gnetales (Additional file 7: Figure S2). Therefore, these genes could have been transferred via a direct RNA-mediated mechanism or the two-step transfer mechanism. However, previous studies have shown that direct transfer of organelle DNA to the nuclear genome is notably frequent, while direct transfer of organelle RNA to the nuclear genome is quite rare [47, 48]. Moreover, Ran et al. [49] found that the rps3 gene underwent a “retroprocessing” event in Conifer II, resulting in the loss of introns and RNA editing sites, and thus may represent an initial stage of gene transfer. Considering the above information, coupled with the fact that the mitochondrial introns and RNA editing sites were lost in Conifer II and Gnetales, we deduce that retroprocessing and the following DNA-mediated gene transfer pathway may be responsible for mitochondrial gene transfer in Conifer II and Gnetales.
In addition to the counterparts encoded by the nuclear genome, the homologs of chloroplast genes or chloroplast-derived genes encoded by the nuclear genome can also function in mitochondria [2, 21, 22, 46]. In Conifer II and Gnetales, certain genes were not found in either the mitochondria or the nuclear genome. For example, in Conifer II, the rpl10 gene only exists in the mitochondrial genome of Sciadopitys, and in Gnetales, six genes (rpl2, rpl5, rpl16, rps7, rps13, and rps19) were lost in the common ancestor of Gnetales, and another six genes (ccmB, matR, mttB, rpl10, rps3, and rps4) were lost from the mitogenome of Ephedra. Considering that most of the above genes participate in important physiological processes such as protein synthesis and energy metabolism [25, 50], these genes might have been functionally replaced by chloroplast genes or cytosol-derived genes encoded in the nuclear genome [21, 22]. However, recent studies have found that a larger number of mitochondrial genes (e.g., nad1, nad2, nad3, nad4, nad4L, and nad5) were lost in select angiosperms, such as Viscum (Viscaceae) [5, 17, 19], and hence, we cannot rule out the possibility that certain mitochondrial genes of Gnetales have been lost directly.
It is intriguing that the matR gene was not found in the mitochondrial genome of Ephedra (Fig. 1a). The mitochondrial matR gene has a conserved domain with mature enzyme activity and a degenerated domain with reverse transcriptional activity involved in the splicing of mitochondrial group II introns [51]. Generally, the matR gene of seed plants is located in the fourth intron of nad1 (nad1i728) [11, 25, 30]. Nevertheless, the matR gene was lost in certain angiosperms, such as Malpighiales (Croizatia brevipetiolata and Lachnostylis bilocularis) and Viscaceae (Viscum and Phoradendron) [5, 17, 19, 52]. Currently, it is unclear why these plants do not need the matR gene and what effect may result from the loss of matR. Grewe et al. [11] found that matR was transferred to the nuclear genome and split into two genes with respective reverse transcriptional activity and mature enzyme activity in Pelargonium. In addition, the nuclear genome can encode four mature enzymes that are transported to the mitochondria, such as nMAT1 participating in the trans-splicing of nad1i394 in Arabidopsis [51]. Furthermore, among the few hundreds of currently available mitogenome sequences, there is no loss of mttB and the loss of ccmB only occurred in Viscum scurruloideum [4, 5]. Therefore, in Ephedra, the matR, mttB, and ccmB genes could also have been transferred to the nuclear genome, but we did not find their nuclear homologs due to great sequence divergence, although it is possible that these genes have been completely lost.
Several factors may be related to mitochondrial gene content variations in land plants
In gymnosperms, almost all gene transfer/loss events are notably ancient, and it is difficult to know why a large number of mitochondrial genes were transferred/lost hundreds of millions of years ago, but the gene content of the mitochondrial genome has remained stable in the later period. Therefore, the gene content variation in gymnosperms could be more likely to be related to the question “why mitochondrial genes are retained in mitochondrial genome”. Based on the newly generated data from a complete sampling of gymnosperm families in combination with plant mitochondrial genomes in public databases, we conducted a comparative analysis to find the factors that might influence mitochondrial gene content variation in land plants and obtained the following findings. First, the easily transferred genes are generally small and hydrophilic with low GC content, supporting the hypothesis that relatively small, low GC content, and soluble proteins such as ribosomal proteins can be easily transported from the nucleus to mitochondria (Additional file 11: Figure S6a,b,c) [1, 2]. Second, in land plants, the higher the GC content, the fewer the mitochondrial genes, implying that mitochondria with high GC content contain fewer genes (Additional file 11: Figure S6d). Third, more mitochondrial genes can be transferred or lost when the nucleotide substitution rates of extant mitochondrial genes are high (Additional file 11: Figure S6e,f). The synonymous and nonsynonymous substitutions of mitochondrial genes are higher in Conifer II and Gnetales than in cycads, Ginkgo and Pinaceae (Fig. 3). In conclusion, four factors, including gene length, GC content, hydrophobicity, and nucleotide substitution rates, may be related to mitochondrial gene content variation in land plants.