- Research
- Open access
- Published:
Evolutionary dynamics of mitochondrial genomes and intracellular transfers among diploid and allopolyploid cotton species
BMC Biology volume 23, Article number: 9 (2025)
Abstract
Background
Plant mitochondrial genomes (mitogenomes) exhibit extensive structural variation yet extremely low nucleotide mutation rates, phenomena that remain only partially understood. The genus Gossypium, a globally important source of cotton, offers a wealth of long-read sequencing resources to explore mitogenome and plastome variation and dynamics accompanying the evolutionary divergence of its approximately 50 diploid and allopolyploid species.
Results
Here, we assembled 19 mitogenomes from Gossypium species, representing all genome groups (diploids A through G, K, and the allopolyploids AD) based on a uniformly applied strategy. A graph-based mitogenome assembly method revealed more alternative structural conformations than previously recognized, some of which confirmed the mitogenome structure reported in earlier studies on cotton. Using long-read data, we quantified alternative conformations mediated by recombination events between repeats, and phylogenetically informative structural variants were noted. Nucleotide substitution rate comparisons between coding and non-coding regions revealed low mutation rates across the entire mitogenome. Genome-wide mapping of nuclear organellar DNA transfers (NUOTs) in Gossypium revealed a nonrandom distribution of transfers in the nuclear genome. In cotton, the fate of NUOT events varied, with mitochondrion-to-nucleus transfer (NUMT) predominantly retained as short fragments in the nuclear genome, with more plastid sequences integrated into the nucleus. Phylogenetic relationships inferred using different data sets highlighted distinct evolutionary histories among these cellular compartments, providing ancillary evidence relevant to the evolutionary history of Gossypium.
Conclusions
A comprehensive analysis of organellar genome variation demonstrates complex structural variation and low mutation rates across the entire mitogenome and reveals the history of organellar genome transfer among the three genomes throughout the cotton genus. The findings enhance our general understanding of mitogenome evolution, comparative organellar and nuclear evolutionary rates, and the history of inter-compartment genomic integration.
Background
Plant mitochondrial genomes (mitogenomes) are notable for their extensive structural variation, in contrast to the typically highly conserved plant plastomes and bilaterian mitogenomes [1, 2]. The use of fluorescence microscopy has shown that most land plant mitogenomes display more complicated and dynamic molecular forms than those found in many other eukaryotic lineages like vertebrates or fungi [3, 4]. For instance, Lactuca sativa (lettuce) and related wild species were found to possess diverse structural arrangements in their mitogenomes, including circular, linear, and branched forms, whereas the parasitic, non-photosynthetic Rhopalocnemis phalloides has a mitogenome that is organized into 21 minicircular chromosomes [5, 6]. Such structural variability is increasingly reported in mitogenome assemblies; for example, Selaginella nipponica has a complex structural network with 27 contigs, Panax notoginseng exhibits interconversion between a “master circle” and seven “subgenomic circles,” and Sorghum mitogenomes display three structural network types with nine or six contigs for each type [7,8,9]. Plant mitogenomes can form various interchangeable isomers (subgenomes and sublimons are generated by recombinations across short repeats) through recombination events mediated by repeat sequences, leading to dynamic shifts in the relative copy numbers of different isomers, a condition known as (sub-) stoichiometric shifting [10, 11]. Collectively, these and other studies have demonstrated that the older, “master circle” concept of mitogenome structure is often an oversimplification that fails to capture the dynamic structural information inherent in mitogenomes from many plant species [12]. In addition to insights derived from mitogenome analysis of single species, new perspectives also have become apparent from phylogenetic analysis of structural variation in mitogenomes during species diversification. Recent advances in long-read sequencing technologies, (e.g., PacBio HiFi, CLR, and ONT nanopore) combined with assembly tools specific to plant mitogenomes (e.g., GSAT, PMAT), have facilitated the exploration of mitogenome structural variation across both distantly and closely related lineages [13, 14].
Despite its highly variable mitogenome structure, plant mitogenome DNA (mtDNA) at the individual nucleotide level evolves at a much slower sequence rate compared to bilaterian mitogenomes, and even more so compared to most plant nuclear and plastid genomes [15]. In general, the rate of synonymous substitutions in plant mitogenomes is 50–100 times lower than that in bilaterians, several times lower than in plastomes, and approximately 16 times lower than in an average plant nuclear genome [16]. However, the rate of synonymous substitutions in plant mitogenomes varies widely across seed plants [17], with some exceptional angiosperm lineages exhibiting up to a 5000-fold increase, with extraordinary rate accelerations observed, for example, in Plantago, Geraniaceae [18]. Structural changes, such as the gain or loss of chromosomes (in multi-chromosome mitogenomes), active rearrangements, recombinations, insertions, deletions, and replication amplification mediated by repetitive sequences all can contribute to poor mitogenome collinearity, even among closely related lineages [19, 20]. Similarly, horizontal transfer also generates homology inference problems, further complicating efforts to correlate structural variation with rates of nucleotide mutation. Such structural dynamism poses challenges for accurately calculating the rate of point mutations in mtDNA [21].
With respect to genic vs. non-genic rates, comparative analyses have shown similar mutation rates in intergenic and genic regions of Fragaria, Oryza sativa, and Arabidopsis thaliana mtDNA [22, 23]. Notwithstanding these expectations on first principles, and empirical observations, structural complexity and evolutionary rates can be correlated in plastomes, suggesting that structural variation could influence mutation dynamics in mitochondrial genomes as well [24]. This highlights the need for further investigation into the causes of evolutionary rate heterogeneity and its relationship with structural variation, especially in homologous intergenic regions, to better understand mitogenome evolution within related individuals and lineages. In addition to native, intra-organellar structural alterations, structural complexity in plant mitogenomes is also driven by intracellular gene transfers (IGTs), including plastid-to-nucleus transfers (NUPTs), mitochondrion-to-nucleus transfers (NUMTs) and plastid-to-mitochondrion transfer, among others [25]. Particularly, the transfer from organelles to the nucleus (NUOTs) is a persistent and widespread process in plants, continuously shaping the evolution, migration, and genetic diversity of nuclear genomes [25, 26]. Traditionally, the identification of NUOTs has relied on a unified reference organellar genome. However, this approach is limited by the challenges associated with accurately assembling complex mitogenomes [27]. To address the aforementioned questions regarding the evolutionary features that shape plant mitogenomes, a phylogenetic approach can be particularly informative, as this is essential for inferring the pace and pattern of evolutionary change in both structure and sequence.
The cotton (Gossypium) plant is a vital source of natural textile fiber and oilseeds [28]. Cotton is divided into eight genomic groups, seven diploids (termed A through G, and K), and one allopolyploid genome group (AD genome) that includes seven species (AD1-AD7), all tracing to a single hybridization and genome doubling event between diploid A-genome and D-genome ancestor. Four of these species- G. hirsutum (AD1), G. barbadense (AD2), G. arboreum (A1), and G. herbaceum (A2) are the source of independently domesticated cultivars [29,30,31]. Cotton serves as an ideal study system for exploring mitogenome evolution as well as intracellular gene transfer (IGT) due to its well-documented evolutionary history, and the availability of extensive genomic resources [32, 33]. Previous work on Gossypium mitogenomes has demonstrated extensive interspecific diversity, including duplication and loss of repetitive sequences, frequent rearrangements, and extensive IGTs [34, 35]. These studies on Gossypium mitogenomes primarily have relied on short-read data and what we now understand to be a potentially misleading and assembly method that assumes a “master circle,” thereby not capturing the dynamism of mitogenomic structural evolution. More fully assembled nuclear genomes and long-read sequencing data are available within Gossypium. That holds promise for exploring the evolutionary dynamics of organellar genomes and their histories of IGTs in both polyploids and diploids Gossypium species.
In this study, we assembled 19 Gossypium mitogenomes covering all recognized genomic groups (A–G, K, and AD) using a combination of short-read and long-read data (PacBio, ONT). We performed a comparative analysis with a focus on the following questions: (1) Can a graph-based mitogenome assembly address the limitations of the traditional single-master-circle model in Gossypium and provide insights into the dynamics of mitochondrial structural variation? (2) What information on point mutation rates may be derived from comparative analyses of mitogenome and plastome noncoding regions, to develop a more comprehensive understanding of mutational processes across both genomes? (3) What have been the evolutionary dynamics of NUPTs and NUMTs during diploid and allopolyploid cotton evolution? Collectively, our goal was to enrich our understanding of organellar genome evolution during the global radiation of the genus Gossypium.
Results
Mitogenome assembly and resolution of alternative structural conformations
We assembled 19 Gossypium gap-free mitogenomes representing all genomic groups (diploids A through G, K, and the allopolyploids AD) by integrating short-read and long-read data (Fig. 1A). These graph-based mitogenomes were assembled into a complex network structure, comprising six (in G. herbaceum and G. arboreum) to 15 contigs (in G. rotundifolium) to illustrate alternative connections of structure changes (Fig. 1A, Additional file 2: Table S1, S2). The total size of the 19 cotton mitogenomes ranged from 551,195 to 624,973 bp, where each contig was counted only once. The average read coverage depth of the 19 samples ranged from 121.0 × to 952.3 × . All assembled mitogenomes shared 57 unique genes including 36 protein-coding genes (PCGs), three rRNAs, and 18 tRNAs (Additional file 1: Fig. S1A). In contrast, the plastomes of the 19 samples were assembled into a typical circle structure, ranging from 159,051 to 160,597 bp (Additional file 2: Table S3). A total of 112 unique genes were shared across all 19 plastomes, including 78 PCGs, 30 tRNAs, and four rRNAs (Additional file 1: Fig. S1B). As expected based on the previously reported results on Gossypium, the ptDNA was conserved not only in genomic size but also in gene content [36]. In contrast to highly variable mitogenome size, such as the giant mtDNAs of Silene conica (∼11 Mb), and dicotyledons exceeding 500 kb, the gene content of mtDNA was relatively conserved, with 42 functionally diverse repertoire protein genes in land plants mitogenome [2, 4, 37].
Mitogenome structural variation in 19 Gossypium samples. A The pan-graph of 19 Gossypium homologous sequences are filled with the same color. B Mitogenome pan-graph of AD6, represented nine contigs (A–I) at 1 × relative depth and three repeats (R1, R2, and R3) at 2 × relative depth. Four connections of each repeat were verified and quantified using PacBio reads on the right. The number of supporting reads is displayed above each connection. C Eight potential conformations mediated by three large repeats (R1, R2, and R3) in allotetraploid cottons. Junctions of three repeats using boxes marked according to reference mtDNA, D contig was used as initial sequence. D Structures 1, 2, and 8 validated the published traditional cotton (N_000011242.1 (AD3), N_000011243.1 (AD4), N_000011240.1 (AD1), N_000011238.1 (A1), N_000011239.1 (A2), and N_000011244.1 (G1)) mtDNA structure [38]
We found that the allotetraploid cottons analyzed in this study possessed homologs of all 12 contigs (A–I as well as R1, R2, and R3) and shared three large repeats R1, R2, and R3. When mapping these contigs in the allotetraploid cotton mitogenomes, we observed high structural collinearity among the homologous contigs (A–I as well as R1, R2, and R3). The mitogenome of the G. ekmanianum (AD6) samples had a network structure that included nine unique contigs (A–I) and three repeat contigs (R1, R2, and R3) (Fig. 1B). Similarly, in the four diploid A-genome cottons (A1a-1, A1a-2, A1cv, and A2), homologous contigs exhibited high structural conservation with high structural collinearity and consistent arrangements. In contrast, the collinearity of unique homologous contigs was significantly disrupted in other diploid cottons, particularly within the Australian Clade (C + G + K) due to recombination events presumably mediated by various repeat sequences. Each branch possessed a unique arrangement of homologous contigs, resulting in branch-specific structural arrangements.
The graph-based assembly of the cotton mitogenomes resolved more alternative conformations than previous work, which suggests the influence of large repeats on mitogenome structural variation. Specifically, eight potential alternative configurations mediated by three repeats (exemplified by R1 10,612 bp; R2 10,242 bp; and R3 11,540 bp in AD6 mitogenome) were observed in allotetraploid cottons. These alternative configurations were further confirmed by long-read data. PacBio and ONT reads were sufficiently long to span the junctions of R1, R2, and R3 on both sides of the contigs. We extracted reads longer than 10 kb that spanned the repeat junctions between contigs with over 500 bp, enabling us to quantify the relative abundance of each combination. The four arrangements with R1 had a long-read depth of 162 (H-R1-E), 149 (C-R1-F), 161 (H-R1-F), and 150 (C-R1-E). The four arrangements with R2 had a long-read depth of 39 (I-R2-B), 61 (D-R2-H), 40 (I-R2-H), and 66 (D-R2-H). The four arrangements with R3 had a long-read depth of 32 (G-R3-I), 51 (B-R3-I), 33 (G-R3-E), and 52 (B-R3-E) (Fig. 1B).
Inferred recombination among the three pairs of repeats (R1 R2 and R3) varied across the species within the allotetraploid cottons. The R1 repeat had the highest representation from PacBio reads and supports recombination-mediated events in all eight allotetraploid cottons. In G. hirsutum race punctatum (Ghp), and G. tomentosum (AD3), R2 was free of recombination, resulting in six conformations mediated solely by R1 and R3. In G. stephensii (AD7) and G. darwinii (AD5) samples only R1-mediated recombination was inferred, in two conformations. In the AD6 and G. barbadense (AD2) samples, 3 pairs of repeats resulted in eight distinct conformations when we performed path resolution analyses on these repeats (Structure 1–8 in Fig. 1C). We also compared the circular mitogenome structures of Gossypium identified in a previous study [38]. Our results showed that Structure 1 and Structure 2 corroborated the previously published mitogenome structures of AD3, AD1, and G. mustelinum (AD4). Additionally, Structures 2 and 8 exhibited complete collinearity with the mitogenome structures of diploid cotton species (A1, A2, and G1) as reported in earlier studies. Interestingly, despite this collinearity, the structural characteristics of all diploid A-genome cotton and G1 in our study displayed distinct branch-specific structural variations, which were absent in any of the potential conformations of the allotetraploid cottons (Fig. 1D). This suggests that while certain conserved mtDNA structures exist across diploid and allotetraploid cotton species, lineage-specific recombination patterns and structural rearrangements also contribute to the diversity of mitogenome architecture within Gossypium.
Contribution of repeats to mitogenome size and variation in conserved structural blocks
Mitogenome repeats were classified into class I (repeats ≤ 100 bp), class II (100–500 bp), class III (500–5000 bp), and IV (> 5000 bp) based on repeat length. Mitogenome size in Gossypium was weakly correlated with repeat classes I, II, and III and primarily reflected an increase in large repeats over 5000 bp in length. Identified repeats ranged from 398 to 407 in number, with lengths spanning 76.25 kb to 226.50 kb (Additional file 2: Table S4). Among the four classes of repeats, repeats in class I were the most abundant, accounting for 70–78% of the total repeats, with a count of 246–301. These were followed by repeats in class II, which accounted for approximately 20% of the total repeats, with 63–104 in number. The length of repeats ≤ 5000 bp only accounted for 4–5% of the mitogenome size except for E1 where it accounted for 8%. In contrast, a total of 5–10 repeats in class IV were identified, with lengths ranging from 53.07 to 202.69 kb, contributing 9–30% of the mitogenome size (Fig. 2A). While the number of short repeats (repeats ≤ 5000 bp) was conserved among mitogenomes of Gossypium, the correlation between the total number of three types of repeats and mitogenome size was not significant (p-value = 0.58). Despite the relatively low number of repeats in class IV, they were the major contributors to the total mitogenome size, and a significant correlation was observed between mitogenome size and the length of repeats in class IV as well as the total repeat length (length of all repeats added) (p-values = 8.29e − 10 and 1.09e − 08, respectively) (Fig. 2A).
Repeat content and conserved fragments in Gossypium mitogenomes. A Total length and number of different repeat types in different Gossypium mitogenomes. The correlation between genome size and the four classes of repeat is shown as a function of the total repeat length for each class to total genome size and the number of repeats in each class to total genome length. R indicates the correlation coefficient, and the p-value was determined by a two-tailed Student’s t-test. B Mitochondrial genome map of the Gossypium core conserved fragments. From outside to inside, the first track is the identification number of each shared fragment; the second track shows the location of the genes on each fragment indicated with a black dot; the third track shows gene identification for each annotated gene (CDS, rRNA, tRNA); the fourth track represents the distribution of SNVs in each fragment as a red bar; the fifth track is the distribution of InDels in each fragment represented by a blue bar; the sixth track shows homologous fragments within the core sequence. C Frequency of SNVs and InDels in CDSs, introns, and IGSs
In total, we identified 44 core conserved blocks from the 19 mitogenomes, with a combined length of 486,010 bp, comprising intergenic regions, 18 introns totaling 29,026 bp, and coding regions covering 32,001 bp. In the 19 mitogenomes, 36 protein-coding genes (PCGs), 3 unique rRNA genes, and 23 unique tRNA genes were annotated (Fig. 2B). We also identified 1616 single nucleotide variant (SNV) sites and 434 InDels sites (Additional file 2: Table S5). Among these, the number of SNV sites in CDSs, intron regions, and intergenic spacers were 47, 80, and 1489, respectively. Similarly, the number of InDel sites in these same regions was 1, 12, and 421, respectively (Fig. 2C).
Rates of nucleotide substitution in Gossypium mitogenomes and plastomes
It has long been recognized that the rate of synonymous substitutions among plant organellar genomes varies widely [4, 16, 39], and a recent study noted low point mutations genome-wide in the Fragaria mitogenome [22]. As expected, the data generated here suggested the mutation rate of the entire plastome in the same plant was 3.8–5.2 times higher than that of the mitogenome (Additional file 1: Fig. S2A). Approximately 49.8 kb of intergenic region sequences (IGS) and 49.6 kb of protein-coding sequences were obtained from 20 plastomes (including H. syriacus) to calculate sequence divergence (d) for noncoding regions as well as synonymous sequence divergence (dS) and nonsynonymous sequence divergence (dN) for coding regions (Fig. 3A). The branch length from root to tip in the phylogenetic trees resulting from these alignments (d, dS, and dN) were used to compare divergence in the different genomic regions. Additionally, 154 kb of IGS, 36 shared protein-coding genes, and 18 shared intron sequences were identified from 19 cotton mitogenomes and H. syriacus to calculate the above divergence measures made for plastomes as well as within introns separately (d-IGS, d-intron, dS, and dN) (Fig. 3B). In general, nucleotide sequence divergence between the d-IGS tree and the dS trees was similar. In the plastome, branch lengths of the d-IGS tree ranged from 0.0144 to 0.0178, and the dS tree ranged from 0.0161 to 0.0205. For the mitogenome, branch lengths of the d-IGS tree ranged from 0.0036 to 0.0040, and the dS tree ranged from 0.0033 to 0.0045 (Additional file 2: Table S6). In addition, based on plastome trees, the substitution rate of the mitogenome was estimated, and branch lengths of the plastome dS tree and d-IGS tree were approximately 3.0–4.5 times greater than those of the corresponding mitogenome trees (Additional file 1: Fig. S2B, S2C). For allotetraploid cottons, which diverged from diploid Gossypium around 1–1.6 million years ago (Ma), the mitogenome divergence was more narrowly restricted, with branch lengths ranging from 0.0035 to 0.0037 in the dS tree, a similar level of the diploid A clade that was the maternal progenitors of the allopolyploids. The branch lengths of A clade ranged from 0.0035 to 0.0037 in the dS tree. In the diploid mitogenome lineages, D5 had a branch length from root to tip of 0.0045 in the dS tree and 0.0040 in the d-IGS tree, while E1 had a branch length from root to tip of 0.0044 in the dS tree and 0.0040 in the d-IGS tree, indicating a higher substitution rate relative to other lineages.
Nucleotide substitution rates among coding and non-coding regions in cotton organellar genomes. Branch lengths were determined based on the substitution rates of synonymous (dS) and non-synonymous (dN) loci in coding sequences and sequence divergence (d) of intronic and intergenic sequences in non-coding regions. The outgroup species H. syriacus is not shown. A plastome trees based on d-IGS, dN, and dS. B Mitogenome tree based on d-IGS, d-intron, dN, and dS
Patterns of integration and distribution of nuclear organellar DNAs in Gossypium
The dataset used for organellar genome assembly was derived from the same samples used for the corresponding nuclear genome assembly. The sample A1a-1 was excluded due to the absence of nuclear genome data. To reliably identify NUOTs (nuclear organellar DNAs), only NUOTs located in the 13 or 26 nuclear chromosomal assemblies were considered as organellar insertions (as opposed to in orphan contigs). We categorized NUOTs into two classes, young NUOTs with similarity ≥ 90% and old NUOTs with similarity < 90% [27]. This could roughly reflect their insertion time as these inserted sequences show expected divergences between plastome and mitogenome corresponding segments. In AD3, an unusually high number of young and ancient NUMTs (nuclear mitochondrial DNAs) were identified in each chromosome of the A subgenome, with detailed information provided in Additional file 2: Table S7. We suspect that this unusually high occurrence of NUOTs may be an artifact of errors in the nuclear genome assembly. The density of NUOTs in the AD4 nuclear A-subgenome was up to 2.7%, which was 100 times higher than other genomes/subgenomes within Gossypium. Core conserved blocks of mtDNA were aligned to the corresponding nuclear genome/subgenome to identify NUMTs. We identified a range from 763 to 2725 NUMTs per genome/subgenome, with the length (length of all NUMTs added) ranging from 265.33 to 1167.42 kb. NUPTs (nuclear plastid DNAs) number ranged between 2776 and 7822 per genome/subgenome, with length (length of all NUPTs added) ranging from 1422.69 to 4351.66 kb (Additional file 2: Table S8). The C1 genome sample exhibited the lowest number of NUOTs, with 763 NUMTs and 2776 NUPTs identified. In allotetraploid cotton, NUMTs/NUPTs in the two-fold larger A-genomes/subgenomes were more numerous than in the smaller D-genomes/subgenomes, as might be expected from the putatively larger neutral insertion space, but the proportion of organellar DNA to nuclear DNA was higher in the D-genome/subgenome (0.27–0.32%) compared to the A-genome/subgenome (0.17–0.20%), representing the highest proportion of NUOTs among the 18 samples (Additional file 1: Fig. S3, Additional file 2: Table S8). Based on an analysis of the age distribution of these transfers, organellar DNA appears to have been continuously transferred to the nuclear genome, with evidence of frequent recent insertions. The proportion of young NUOTs was much higher than ancient ones, which might indicate a decreasing ability to detect NUOTs as they decay (Fig. 4A). Compared to NUPTs, NUMTs tended to be preserved as shorter fragments (Fig. 4B). In the young NUMT group, type II (200–500 bp) was the most common, followed by type I (100–200 bp), whereas in the ancient NUMT group, the length distribution was shorter. As suggested above, ancient NUMTs in the nuclear genome were primarily retained as shorter fragments, while large organellar DNA insertions (> 10 kb) were all recent events (Additional file 1: Fig. S4, Additional file 2: Table S9), reflecting the natural process of mutational decay of putatively neutral or only slightly deleterious insertions.
Genomic landscape of NUPTs and NUMTs in Gossypium. A Phylogenetic tree for 18 cotton samples based on mitogenomes, whole-genome metrics for NUMTs/NUPTs including total numbers and total length. B The frequency of different transfer fragments in 18 cotton samples for young and ancient NUMTs and NUPTs. C Circos plot represents the distribution of young and ancient NUMTs and NUPTs in the A–D subgenome nuclear chromosomes of G. hirsutum. The outer circle of gray bands indicates the chromosome number. The second track gene density, indicated by gene length per 100,000 bp. The circle of light orange bands indicates the distribution of young and ancient NUMTs, circle of light green bands indicates distribution of young and ancient NUPTs. D Preference of organellar genome transfer sequences in Gossypium. The density feature of NUOTs from different regions of the organellar genome is based on non-overlapping 1-kb windows. The regions marked with green pentagrams were from the plastome
We also examined the location and distribution of NUOTs in the corresponding nuclear genome. Both ancient NUMTs and NUPTs showed nonrandom insertion locations in the nuclear genome (Fig. 4C). Young NUPTs were scattered across nearly all regions of the nuclear genome, whereas ancient NUPTs were predominantly located in gene-rich regions and were sparse or absent near chromosome centromeres. The distribution of NUMTs across nuclear chromosomes varied among individuals. NUMTs were more densely located in gene-rich regions, while they were relatively sparse outside of gene-rich regions. NUMTs and NUPTs share some common low-frequency insertion regions, which suggest a relationship to structural features of the nuclear genome [40] such as recombinational break frequencies. The distribution patterns of NUOTs in the nuclear genome were generally consistent across our 18 samples studied (Additional file 1: Fig. S5).
The imbalance of NUMTs and NUPTs in Gossypium
A large number of NUPTs and NUMTs have been found in plants for many years, including rice, grapevine, maize, sorghum, and Arabidopsis [41, 42]. The insertion level of NUMTs and NUPTs was variable in the species studied, that may be the same level, or one may be more frequent. Our findings suggested that NUMTs were predominantly retained as shorter fragments in the nuclear genome (see above), and more plastid sequences were integrated. The 44-core conserved mitogenome blocks and plastomes were divided into continuous non-overlapping 1-kb sliding windows to calculate the distribution of NUOT depth variation. We also observed the insertion features of different regions with respect to young vs. ancient NUOTs (Additional file 1: Fig. S6). Not all mtDNA from the 44 core conserved blocks were represented in the NUMTs found in the nuclear genome, and thus the depth generally ranged from 0 to 20 × , except for sequences in the high-depth regions (marked with green pentagrams in Fig. 4D). The NUMTs regions marked with green pentagrams were the sequences that also contained NUPTs and thus accurate estimates of depth were not always possible. The NUPTs in each genome or subgenome correspond to all regions of the ptDNA, with sequencing depths ranging from 10 × to 75 × (Fig. 4D). Perhaps unsurprisingly given the nearly identical target mutational space, and in contrast to the situation for NUOTs discussed above, the distribution of NUPT depth was similar across the A-subgenome and D-subgenome in the group for both young and ancient NUPTs. In diploid cotton, the distribution of young NUPTs in the plastomes exhibited a species-specific pattern, while for ancient NUPTs, which occur at higher frequency the distributions are similar across different diploid cotton species. The A-D subgenomes might have retained the NUPTs from respective ancestral diploids, resulting in a similar transfer depth from the plastomes. We identified the number of NUPGs (plastid genes located in NUPTs) from the 18 samples we analyzed and compared the sequences to infer pseudogenization (see section below). The top 20 plastome protein-coding-genes NADH dehydrogenase (ndh) and ATP synthase (atp) from the SSC region as well as RNA polymerase (rpo) from the LSC region. Among the most frequent NUPGs, ycf1 and rpoC2 had the largest number of NUPGs. The NUPGs transfer frequency may be related to the size of the gene, which may not accurately reflect the functional selection of NUPGs (Additional file 1: Fig. S7, Additional file 2: Table S10).
The evolutionary fate of NUMGs and the 3′ rpl2 gene insertion in Gossypium
Although many segments of organellar DNA were found in the nuclear genome, NUPGs (plastid genes located in NUMTs) and NUMGs (mitogenome genes located in NUMTs) were rarely expressed and generally nonfunctional [35]. Most of the identifiable NUMGs were fragmented in nuclear chromosomes. In allotetraploid cotton (AADD), the NUMGs that were unfragmented tended to contain numerous SNVs and InDels compared to the original mitogenome copies, resulting in numerous amino acid changes and likely pseudogenization. That said, the ribosomal protein genes (NUMGs), such as rpl10, rpl5, and rpl16, identical to the original organellar genes, were found concentrated in the A clade samples. In the Australian clade (C + G + K), many of the NUMGs were not detected in the nuclear genome (Fig. 5A). It has been reported that the intact and expressed mitogenome protein gene 3′ rpl2 was found in the A2 nuclear genome [43]. We identified 3′ rpl2 sequences with a length of 870 bp in the nuclear chromosome 13 in all 18 samples. Concurrently, a 184-bp 3′ rpl2 sequence remained in the Gossypium mitogenome (Fig. 5B). We found that one copy of the 3′ rpl2 gene in allotetraploid cotton was derived from the diploid ancestor D5 genome and the other from the A genome, respectively, as determined by the phylogenetic tree of the 3′ rpl2 sequence (Fig. 5C). We also found a 3′ rpl2 in the nuclear genome of T. cacao and H. syriacus, thus dating this insertion to at least 50 million years ago [44]. A 5,000 bp length segment upstream and downstream of the 3′ rpl2 gene were extracted and aligned. It was found that the 3′ rpl2 gene was inserted into the same position of the AD subgenomes, supporting its insertion prior to the evolution of the genus. Phylogenetic inference of Gossypium based on the maximum-likelihood (ML) tree of the 3′ rpl2 gene showed that the main branches were consistent with those of the nuclear or organellar trees (Fig. 6). This reflected the presumed ancient transfer of this gene fragment [45]. NUMGs may provide ancillary evidence relevant to the history of cotton and may help understand the functional genomics of genomic interaction [46, 47].
Characteristics of mitogenome protein-coding gene transfer in Gossypium. A Mitogenome CDS gene transfers among 18 cotton species. Colored boxes represent the presence of: (1) no detectable mitogenome homolog in the nucleus indicated with gray coloring; (2) mitogenome gene fragmented in the nuclear genome indicated with pink coloring; (3) full-length mitogenome gene with SNVs and/or InDels indicated with blue coloring; (4) identical (100% identity) mitogenome gene present in the nuclear genome indicated with green coloring. B Based on previous studies, the structure and approximate sizes of mitochondrial rpl2 gene in the mitogenome [43]. Transfer characteristics of the rpl2 gene in Gossypium nuclear genomes. The 5′ rpl2 gene with a length of 1005 bp in the Gossypium mitogenome marked in yellow. The 3′ rpl2 gene with a length of 870 bp in the Gossypium nuclear genome marked with blue. The 5000 bp length upstream and downstream of the 3′ rpl2 gene in A subgenomic homologous blocks marked with gray and in D subgenomic homologous blocks marked with brown. C Phylogenetic relationships of the transferred 3′ rpl2 gene
Mitogenome data and the phylogeny of the genus
Deducing the relationships among diverse species can reveal differences in evolutionary histories among DNA from different cellular compartments. Mitogenomes are maternally inherited in Gossypium [48]. The allopolyploids contain plastomes and mitogenomes similar to those of the A genome rather than the D genome; these two genomes were inferred to have been the maternal and paternal progenitors of the allopolyploids, respectively [30, 31, 49]. Nuclear data have shown that the paternal donor to the allopolyploids, i.e., the D genome, represents one of the two basally diverging branches in the genus, and the B genome taxa are phylogenetically sister to the group that includes the A and F genomes, which share a most recent common ancestor [45] (Fig. 6A). It has long been recognized, however, that this phylogenetic resolution differs from that inferred from the plastomes, in which the Australian clade (C + G + K) is resolved as being one of the two branches descending from the earliest divergence [50, 51], with the placement of the B genome being variable depending on the data set [52, 53] (shown as sister to the Australian cottons in Fig. 6B). The mitogenome phylogeny has several topological differences with both the nuclear tree and the plastome tree (Fig. 6C, D, E, F). In all mitogenome data sets, the Australian clade was resolved as early diverging but in the CDS-only tree, the Australian clade also included samples from the E and D genomes but with low support (Fig. 6C). In all mitogenome trees the F genome sample resolved between the AD and A genome clades; this pattern was not found in either the plastome or nuclear trees. In general, these results support and reflect a long-standing uncertainty in the exact placement of the B and E genomes relative to the other groups (Fig. 6). That is, organellar trees routinely resolve the Australian clade as sister to other groups, whereas nuclear trees lead to an inference that the New World D-genome species are sister to the remainder of the genus, with the B and E groups having uncertain placements connected by short branches to these other basal divergences. We infer that this phylogenetic uncertainty reflects a reality of relatively closely spaced temporal divergences early in Gossypium diversification, generating short interior branches that are difficult to resolve due to both homoplasy and possibly lineage sorting. In our data, a 1496-bp deletion occurred in both the B genome sample and the A + F + AD clade but not in the remaining diploid samples (Additional file 1: Fig. S8). This mutation thus serves to phylogenetically link the B to the A + F + AD clade, lending support to the existence of a geographically distinct African clade that is separate from the more northeasterly and poorly understood E genome clade.
Phylogenetic relationships of Gossypium. A Phylogenetic relationships of Gossypium summarized from published nuclear-based research [30, 33, 49, 54, 55]. B A maximum likelihood (ML) tree of Gossypium based on complete plastomes. C An ML tree based on mitochondrial concatenated CDS sequences. D An ML tree based on mitochondrial CDS + intron concatenated sequences. E An ML tree based on mitochondrial concatenated intergenic CDS and intron sequences with H. syriacus as outgroup.F An ML tree based on 44 core mitogenome sequences of Gossypium. Branches with bootstrap support values below 75% were not shown of all the phylogenetic trees
Discussion
Structural evolution and stability in Gossypium mitogenomes
About 40 years ago, Palmer and Herbon described the conundrum that plant mitochondrial genomes often experience dramatic structural rearrangements but that their genes had exceptionally slow rates of nucleotide substitutions [15]. Here, we used short-read and long-read to generate high-resolution assemblies for 19 samples of Gossypium representing the full breadth of diversification among diploid and allopolyploid cotton species (Fig. 1A). The graph-based assembly approach utilized captured the traditionally used methods of structural determination, but in addition resolves more potential genome conformations. Eight potential alternative configurations mediated by three repeats were detected in allotetraploid cotton and verified by long-read data, confirming the influence of large repeats on genome rearrangement (Fig. 1C). The four arrangements with R1 had a long-read depth of 162 (H-R1-E), 149 (C-R1-F), 161 (H-R1-F), and 150 (C-R1-E). The four arrangements with R2 had a long-read depth of 39 (I-R2-B), 61 (D-R2-H), 40 (I-R2-H), and 66 (D-R2-H). The four arrangements with R3 had a long-read depth of 32 (G-R3-I), 51 (B-R3-I), 33 (G-R3-E), and 52 (B-R3-E). Complex structural diversity has been observed among numerous plant mitogenomes. In previous studies of Gossypium mitogenomes, traditional assembly methods have typically concatenated the alternative connective relationships into a simple circular structure [38]. However, this approach does not accurately reflect the diverse structural characteristics of the mitogenome. In the past, repetitive sequences in mtDNA presented a challenge for accurate resolution of alternative conformations, a challenge largely surmounted by long-read sequencing. The mitogenome structure of A1, A2, and G1 using traditional assembly methods exhibited complete collinearity with allotetraploid cotton (Fig. 1D). The straightforward approach of using a reference genome for assembly may capture the sequence content of a mitogenome, but it overlooks genomic isomers and oversimplifies species-specific mitogenome structural variation [38]. The four diploid A-genome cottons (A1a-1, A1a-2, A1cv, A2) and G1 in our study contain large repeats that can mediate rearrangements. Also, they presented different phylogenetic branch-specific structural characteristics that were not found in the allotetraploid clade.
In plant mitogenomes, the highly active replication and amplification of repetitive sequences, as well as recombination, can lead to the formation of various interchangeable isomers [56], accompanied by frequent rearrangements during recombination events. These processes are important sources of insertion-deletion (InDels) variations and the emergence of new Open Reading Frames (ORFs). It is an open question whether these variations have any functional significance, either in nature or under domestication. Cotton, with its long history of domestication, presents an intriguing case for studying structural evolution and the question of whether any of this variation has been driven by domestication selection. In this study, we examined all four of the cultivated cotton species, namely the American allopolyploids AD1 and AD2, and the two African/Asian diploids A1 and A2. Our results reveal no significant structural differences between the wilder and domesticated forms, suggesting that the process of cotton domestication did not indirectly target mitogenome structure. Similarly, no large structural changes have arisen in the mitogenome of domesticated (Lactuca sativa) compared to wild (L. serriola) lettuce [5]. Counterbalancing these examples of stasis, differences in mitogenome structural characteristics have been reported in cultivated and wild lineages of Sorghum [9]. Similarly, selective sweeps under domestication have been inferred to be causally connected to an indel in Pyrus (apples, pears, and other stone fruits) mitogenomes. In that study, the impact of domestication on genome structural variation was not addressed due to the limitations of traditional assembly methods [57]. The forces of domestication on the evolution of mitogenome structure still require more research.
Mutation rates in the Gossypium mitogenomes
The data generated here permit a precise comparison of nucleotide substitution rates in mitochondrial vs. plastid genomes. For mitochondrial genomes, d-IGS branch lengths ranged from 0.0036 to 0.0040, and those of the dS tree ranged from 0.0033 to 0.0045, whereas for the plastome data, these branch lengths were approximately 3.8 ~ 5.2times higher (Additional file 1: Fig. S2A, Additional file 2: Table S6). It has long been recognized that most land plants exhibit much higher synonymous substitution rates in plastomes than in mitogenomes [2]. These data serve to emphasize the well-known observation of exceptionally slow rates of mitochondrial gene evolution [16]. This low mutation rate remains incompletely understood, but it may be related to the mitogenome repair mechanism. Organellar DNA fidelity is maintained by a highly specialized and nearly error-free repair system [58]. Specifically, the homologous recombination (HR) pathway, known for its precise repair mechanism, is the primary repair method for plant mtDNA, while non-homologous end joining (NHEJ) and microhomology-mediated end joining (MMEJ) pathways may lead to complex genomic rearrangements, insertions, and duplications [59,60,61]. Compared to the conserved coding regions, non-coding regions generally exhibit more variability in both sequence and structure. Christensen [62, 63] proposed two models to explain this phenomenon: one based on differences in mutation input and the other on differences in selective pressures between these two types of regions. According to the mutation input model, distinct repair mechanisms function on gene sequences and intergenic sequences; however, subsequent analyses of substitution rates in non-coding regions appear to have ruled out this possibility. Thus the validity of this model and proposed alternative remains largely untested [62]. The selection pressure model posits that coding and non-coding regions are subject to different levels of (mostly) stabilizing selection. Frequent rearrangements, duplications, and amplification events, as well as inaccurate repair of non-coding regions, result from relaxed selection in non-coding regions, allowing neutral mutations to accumulate, whereas mutations in coding regions are eliminated by stabilizing selective pressure [63, 64]. This hypothesis suggests that the nucleotide substitution rate in intergenic regions should approximate the neutral mutation rate of protein-coding sequences. Subsequent studies on mutation rates in Fragaria [22] and Arabidopsis thaliana [65] mutation accumulation lines support this hypothesis. Similarly, the evolutionary rate calculated for nucleotide substitution in IGS regions and synonymous nucleotide substitution in genes of Gossypium in this study are more consistent with the selection pressure model. We note that our resolution of core homologous blocks in this study alleviates the prior constraint of not having homologous non-coding data for evolutionary comparisons, which are necessary for mutation rate estimation.
Utility of mitochondrial structural mutations for phylogenetic inference in Gossypium
Because of its economic importance, Gossypium has been subject to numerous phylogenetic analyses of relationships within and among genome groups [30, 66]. These have led to a reasonably well-resolved evolutionary understanding, although several of the branching orders among basally diverged lineages are unstable in different datasets, and using nuclear versus organellar data. These are discussed above (see the “ Results” section), and are evident here in even among mitogenome trees from different genomic partitions (Fig. 6). Several factors may underlie the observed incongruences, including ancient lineage sorting or hybridization events, and possibly artifacts caused by the relatively low level of mtDNA divergence among cotton species combined with the stochasticity of small numbers of potentially phylogenetically informative mutations. With respect to cytoplasmic capture accompanying ancient hybridization, this is a well-known phenomenon in plants in general, and is documented in Gossypium [67,68,69]. For these and other reasons, phylogenetic signals from different genomes may depict different evolutionary relationships. Notwithstanding these many evolutionary phenomena that might contribute to phylogenetic incongruence among inferences derived from genomes of different cellular compartments, special consideration may be given to “signal” phylogenetic characters; that is, those that on first principles are unlikely to reflect convergent or parallel evolution and which arguably represent a single evolutionary event that indicates shared ancestry. A case in point from the present study concerns a 1496 bp deletion that was restricted to and thus shared by the B genome and the [A + F + AD-genome] clade (Additional file 1: Fig. S8). This mutation thus serves as compelling evidence linking these genomes together as monophyletic, a biogeographically satisfying observation in that it implicates an ancestrally monophyletic African clade (A + B + F) that is distinct from the E genome clade that is restricted to the Horn of Africa and the Arabian Peninsula.
Intracellular transfer of DNA between the plastome, mitogenome, and nuclear genome
Intercellular transfer of genomic fragments is common in plants, particularly from the plastome and mitochondrial genome into the nucleus, and from the plastome and nuclear genome into the mitogenome. Here we resolved the history of these transfers in Gossypium. A total of 763–2725 NUMTs (nuclear mitochondrial DNAs) (0.014–0.054% of the nuclear genome) and 2776–7822 NUPTs (nuclear plastid DNAs) (0.075–0.25% of the nuclear genome) with high-confidence were identified (Additional file 2: Table S8). We observed the imbalance of NUMTs and NUPTs and the nonrandom distribution of NUOTs (nuclear organellar DNAs) in the Gossypium nuclear genome. The number of NUOTs identified in allotetraploid cotton is approximately twice as high as in diploid cotton, indicating that NUOTs from the ancestral sub-genomes have been largely retained following allopolyploidy in cotton (Fig. 4A). Within the allopolyploid clade, the nuclear genomes of AD2, AD4, and AD5 species display a greater or comparable number of young NUMTs in their two sub-genomes than in the genomes of their diploid progenitors that have either the A or D genome, suggesting that organellar sequence transfer has been ongoing since polyploid formation. In contrast, the clade comprising the closely related AD1, Ghp, AD6, and AD7 species exhibit a reduced number of NUMTs compared to their diploid relatives implying decay of older insertions with a more limited spectrum of new insertions since divergence of this clade from the remaining allopolyploids (Additional file 1: Fig. S2A, Additional file 2: Table S8). In general, the distribution of NUMTs within and among the different allotetraploid cotton species suggests that there exist species-specific nuclear genome clearance and insertion dynamics, at least for older insertions. The foregoing pattern for NUMTs, however, is not observed for young NUPTs. The frequency of NUMT acquisition and retention, as well as the rate of their removal, may be influenced by additional limiting factors [70]. We observed the nonrandom distribution of ancient NUPTs mainly at the ends of the long and short arms of gene-dense chromosomes in the Gossypium nuclear genome (Fig. 4B). Ancient NUPTs/NUMts may become fragmented by TE insertions and shift NUPTs/ NUMTs away from the centromeres by TE-based recombination. Large NUPTs/ NUMTs appear to have been preferentially inserted in the centromeres region of the chromosomes, where they became fragmented and reshuffled [41, 42]. These integration features of NUOTs were also found in wheat [71]. The unique integration pattern of NUOTs may be shared in more angiosperms, and date from more species will be essential in testing these hypotheses.
Conclusions
Here, we successfully assembled 19 Gossypium mitogenomes representing all of the genome groups (diploids A through G, K, and the allopolyploids AD). This comprehensive analysis allowed for an in-depth comparison of mitogenome structural variation, evolutionary rates, and the history of organellar genome transfer among genomes from the nuclear, plastid, and mitochondrial genomes. We showed how graph-based mitogenome assembly overcomes the shortcomings of the traditional “master circle” model and revealed alternative structural conformations not previously recognized. Eight potential alternative configurations mediated by three large repeats (R1, R2, and R3) in the allotetraploid cottons were discovered. These configurations were quantified using long-read data, with some confirming the mitogenome structure reported in earlier studies on cotton. We also show conserved mtDNA structural collinearity across diploid A genome taxa and the allotetraploid clade. In other diploid clades, each branch possessed a unique arrangement of homologous contigs, resulting in lineage-specific recombination patterns and structural rearrangements contributing to the diversity of mitogenome architecture within Gossypium. By comparing nucleotide substitution rates in coding versus non-coding regions, we demonstrate an extremely low point mutation rate across the entire mitogenome. Genome-wide surveys revealed continuous and nonrandom distribution of organellar genome transfer into the nuclear genome, with frequent recent transfers. NUMTs were predominantly retained as shorter fragments in the nuclear genome, and more plastid sequences were integrated into the nucleus. Finally, we deduced phylogenetic relationships using different data sets to highlight distinct evolutionary histories among these cellular compartments, noting in particular a structural event that supports an interpretation of a monophyletic African clade.
Methods
Data source
A total of 19 distinct cotton species were utilized in this study. The raw sequencing data of both short and long-read from the same individual were obtained from NCBI ( https://www.ncbi.nlm.nih.gov/sra/) (Additional file 2: Table S11). Specifically, the eight allopolyploid cottons included two domesticated species, G. hirsutum (AD1) and G. barbadense (AD2), and the other were from the six wild species G. tomentosum (AD3), G. mustelinum (AD4), G. darwinii (AD5), G. ekmanianum (AD6), G. stephensi (AD7), and G. race punctatum (Ghp). The 11 diploid cotton species included the cultivated G. arboreum “Shixiya1” (A2) and G. herbaceum “Zhongcao-1” (referred to as “A1cv”), as well as wild G. herbaceum var. africanum Mutema (wild form referred to as “A1a-1”), G. herbaceum subsp. africanum (wild form referred to as “A1a-2”), G. raimondii (D5), G. anomalum (B1), G. sturtianum (C1), G. thurberi (D1), G. stocksii (E1), G. longicalyx (F1), G. bickii (G1), and G. rotundifolium (K2). In addition, Theobroma cacao and Hibiscus syriacus in the Malvaceae were used as outgroups. Assemblies of nuclear genomes for each respective individual (except for A1a-1, which was not available) were also obtained from NCBI for detecting NUOTs. All sample data information is provided in Additional file 2: Table S12.
Organellar genome assembly and annotation
First, the draft mitogenome was assembled based on short-reads using a reference-free strategy. About five Gb clean Illumina pair-end reads were used for de novo assembly in GetOrganelle v1.7.5 [72] setting K-mer values to 21, 45, 65, 85, and 105, respectively. Five independent runs at each K-mer level were performed to obtain the assembled scaffolds. To detect putative mitogenome motifs, the scaffolds were visualized in Bandage v0.8.1 [73]. Conserved mitogenome protein-coding genes (PCGs) from Gossypium were used as the query sequences to search against the draft scaffolds, and scaffolds containing high-confidence hits were selected as candidate mitochondrial scaffolds. Candidate scaffolds were utilized as query sequences to search for long reads (Pacific Biosciences and Oxford Nanopore) using Mimimap2 v2.24 [74]. The candidate mitogenome long reads that were over 5,000 bp and harbored contiguous alignments of at least 500 bp and 80% identity against the reference were further filtered out using a Perl script. For some samples the first round of assembly results contained contamination by other genomic sequences, so raw long reads were filtered using several rounds with longer contiguous alignment lengths and higher identity. Subsequently, the candidate mitogenome long reads were de novo assembled with Flye v2.8.3 [75], followed by a two or three-rounds of polishing with short-reads using Pilon v1.23 [76]. The graph-based mitogenome was visualized by Bandage v0.8.1. For the plastome assembly, about five Gb of reads were randomly extracted from total clean PE reads using SeqKit v2.1.0 [77]. The de novo assemblies were implemented in GetOrganelle v1.7.5 with five k-mers setting as above. Bandage v0.8.1 was used to visualize the plastome and to obtain a circular molecule.
To annotate the cotton mitogenomes, ribosomal RNA (rRNA) genes, and all PCGs from basal angiosperms Magnolia biondii (MN206019), Nymphaea colorata (NC_037468), and previously published G. hirsutum (NC_028254) were used as the references to align with the assemblies using BLAST v2.7.1 [78]. Geneious Prime v2022.2.2 was used to manually adjust the annotations of rRNA genes and PCGs, especially trans-splicing (TS) coding genes. tRNAscan-SE v2.0 (http://lowelab.ucsc.edu/tRNAscan-SE/) was used to predict transfer RNA (tRNA) genes. Plastomes were annotated using the online tool GeSeq [79] (https://chlorobox.mpimp-golm.mpg.de/geseq.html) with default settings. All annotations were further manually verified and corrected. The genome map was generated using OGDRAW v1.3.1 [80] (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html).
Repeat analysis
BLAST v2.7.1 with a word size of 7 and e-value cutoff of 1e − 6 was used to detect repetitive sequences longer than 30 bp. The detected repeats were divided into four length intervals: short repeats length ≤ 100 bp, medium repeats 100 < length ≤ 500 bp, long repeats 500 < length ≤ 5000 bp, and longest repeats length > 5000 bp. The total number and length of repetitive sequences were summarized. Note that the small repeats within the large repeats were excluded when calculating the total repeat lengths. The R package caper [81] was used to perform linear regression analysis between mitogenome sizes and repetitive sequences across the 19 species.
Analysis of core homologous mtDNA
To obtain mitogenome-wide core conserved blocks, mitogenomes were pairwise aligned using BLAST v2.7.1 with a word size of 7 and an e-value cutoff of 1e − 6. Syntenic blocks with a minimal length of 1 kb shared among the 19 cotton mitogenomes were extracted. Any two adjacent syntenic blocks with an interval sequence of less than 50 bp were merged into one block. In addition, all syntenic blocks containing PCGs were retained even if their length was shorter than 1 kb. All identified core conserved blocks were concatenated for annotation. We extracted single nucleotide polymorphisms (SNPs) from core conserved sequences using SNP-sites [82] and then calculated the number and frequency of SNPs located in PCGs, introns, and intergenic region sequences (IGS), respectively. DnaSP v6 [83] was used to extract InDels.
Phylogenetic analysis
The plastome phylogenetic tree was inferred from 19 plastomes that were alignment sequences. Poorly-aligned regions were trimmed with GBLOCKS v.0.91b. The four mitogenome phylogenetic tree was inferred from aligned nucleotides within the genome-wide core conserved blocks, including PCGs, introns, and IGSs. All aligned core conserved blocks, PCGs, and introns sharing among the 19 cottons were concatenated respectively. Poorly aligned regions of core conserved blocks were trimmed with GBLOCKS v.0.91b. Maximum likelihood (ML) trees were constructed with IQ-TREE v.1.6.12 using the non-parametric bootstrap from 1000 replicate iterations to assess clade support [84]. The General Time Reversible (GTR) model, which includes rate variation among sites (+ G) and invariable sites (+ I), was chosen as the optimal model by the inbuilt module in IQ-TREE v.1.6.12. ML trees were visualized with FigTree v.1.4.4. (https://tree.bio.ed.ac.uk/software/figtree/).
Estimation and comparisons of nucleotide substitution rates
The cotton phylogeny inferred from the mitogenome alignment of core conserved blocks was used as the constrained topology for estimating mitogenome substitution rates. Similarly, the estimation of plastome substitution rates was based on the plastome phylogenetic tree. And, we also estimation of mitogenome substitution rates based on the plastome phylogenetic tree to compare substitution rates among them. To obtain IGS rates, core conserved blocks among the 19 cotton species, together with the outgroup Hibiscus syriacus with the minimum length ≥ 1 kb, were extracted using BEDtools v2.30.0 and then concatenated. All the IGSs within plastome were extracted following the removal of one inverted repeat (IR) to calculate sequence divergence (d). Synonymous (dS) and nonsynonymous (dN) substitution rates of coding sequences (CDS) were calculated using codeML with the branch model in Paml4.9j [85]. Branch lengths scaled to the sequence divergence (d) in intronic (dIntron) and IGS (dIGS) regions were calculated using the baseml program with the GTR + G + I nucleotide substitution model. Cumulative branch lengths from the root to each terminal tip were summarized using castor v1.7.11 in R v4.2 [86].
Identification of NUMTs and NUPTs
NUPTs and NUMTs were identified using BLAST v2.7.1 with a word size of 7 and an e-value cutoff of 1e − 6. For NUMTs, the 44 core conserved blocks were aligned to corresponding nuclear genomes/subgenomes to obtain conserved mitochondrial NUMTs. For NUPTs, plastomes were aligned to corresponding nuclear genomes/subgenomes. To obtain high-confidence NUMTs/NUPTs, they were further filtered using two criteria: (i) contiguous alignments with an organellar sequence with a length threshold of over 100 bp and 80% identity; (ii) NUMTs/NUPTs that were only located in the 13/26 nuclear chromosomal assemblies excluding unplaced contigs, were judged as organellar insertions. NUMTs/NUPTs were divided into two groups based on their estimated insertion time (reflected by the sequence similarities), the young NUMTs/NUPTs with similarity ≥ 90%, and the ancient NUMTs/NUPTs with 80% ≤ similarity < 90%. According to NUMTs/NUPTs fragment length, we divided them into seven intervals: type I (≤ 100– ≤ 200 bp), type II (< 200– ≤ 500 bp), type III (< 500– ≤ 1000 bp), type IV (< 1000– ≤ 2000 bp), type V (< 2000– ≤ 5000 bp), type VI (< 5000– ≤ 10,000 bp), and type VII (length > 10,000 bp).
Analyses of NUMTs and NUPTs sequence evolution
For calculating the frequency of organellar DNA insertion into the nuclear genome, all NUMTs/NUPTs were aligned to each corresponding concatenation of 44 conserved mtDNA sequence blocks and the ptDNA using BWA-SW v0.7.17 [87]. A non-overlapping sliding window with a window size of 1 kb was used to scan NUMTs using BEDtools v2.30.0. In addition, to investigate the post-transfer fate of NUMGs (mitochondrial genes located in NUMTs) they were divided into four classes: (I) no detectable mitogenome homologous sequence in the nuclear genome; (II) mitogenome coding gene incomplete homologous fragments in the nuclear genome; (III) mitogenome coding gene with SNP and or InDel variations, full-length intact homologs in the nuclear genome; and (IV) fragment identical to the original organellar gene (100% identity). The 1 kb flanking sequences of mitogenome-derived genes were also extracted using BEDtools v2.30.0. For the NUPGs (plastome genes located in NUPTs), we calculated the frequency of each coding gene transfer in 18 samples. An expressed 3′ rpl2 gene was found in the nuclear genome of cotton [43]. We downloaded the 3′ rpl2 (BE054892) of A2 from NCBI.
Data availability
The datasets supporting the conclusions of this article are included within the article and its additional files.
Abbreviations
- IGTs:
-
Intracellular gene transfers
- mtDNA:
-
Mitogenome deoxyribonucleic acid
- NUMTs:
-
Mitochondrion-to-nucleus transfers
- NUPTs:
-
Plastid-to-nucleus transfers
- NUOTs:
-
Nuclear organellar DNA transfers
- NUMGs:
-
Mitochondrial genes located in NUMTs
- NUPGs:
-
Plastid genes located in NUPTs
- ptNA:
-
Mitogenome deoxyribonucleic acid
References
Lynch M, Koskella B, Schaack S. Mutation pressure and the evolution of organelle genomic architecture. Science. 2006;311(5768):1727–30.
Smith DR, Keeling PJ. Mitochondrial and plastid genome architecture: reoccurring themes, but significant differences at the extremes. Proc Natl Acad Sci. 2015;112(33):10177–84.
Cheng N, Lo YS, Ansari MI, Ho KC, Jeng ST, Lin NS, Dai H. Correlation between mtDNA complexity and mtDNA replication mode in developing cotyledon mitochondria during mung bean seed germination. New Phytol. 2017;213(2):751–63.
Wang J, Kan S, Liao X, Zhou J, Tembrock LR, Daniell H, et al. Plant organellar genomes: much done, much more to do. Trends Plant Sci. 2024;9:754.
Kozik A, Rowan BA, Lavelle D, Berke L, Schranz ME, Michelmore RW, Christensen AC. The alternative reality of plant mitochondrial DNA: One ring does not rule them all. PLoS Genet. 2019;15(8):e1008373.
Yu R, Sun C, Zhong Y, Liu Y, Sanchez Puerta MV, Mower JP, Zhou R: The minicircular and extremely heteroplasmic mitogenome of the holoparasitic plant Rhopalocnemis phalloides. Current Biology 2022, 32(2):470–479. e475.
Kang JS, Zhang HR, Wang YR, Liang SQ, Mao ZY, Zhang XC, Xiang QP. Distinctive evolutionary pattern of organelle genomes linked to the nuclear genome in Selaginellaceae. Plant J. 2020;104(6):1657–72.
Yang H, Ni Y, Zhang X, Li J, Chen H, Liu C. The mitochondrial genomes of Panax notoginseng reveal recombination mediated by repeats associated with DNA replication. Int J Biol Macromol. 2023;252:126359.
Zhang S, Wang J, He W, Kan S, Liao X, Jordan DR, et al. Variation in mitogenome structural conformation in wild and cultivated lineages of sorghum corresponds with domestication history and plastome evolution. BMC Plant Biol. 2023;23(1):91.
Wu ZQ, Liao XZ, Zhang XN, Tembrock LR, Broz A. Genomic architectural variation of plant mitochondria-A review of multichromosomal structuring. J Syst Evol. 2022;60(1):160–8.
Khachaturyan M, Reusch TB, Dagan T. Worldwide population genomics reveal long-term stability of the mitochondrial genome architecture in a keystone marine plant. Genome Biol Evol. 2023;15(9):evad167.
Sloan DB. One ring to rule them all? Genome sequencing provides new insights into the ‘master circle’model of plant mitochondrial DNA structure. New Phytol. 2013;200(4):978–85.
He W, Xiang K, Chen C, Wang J, Wu Z. Master graph: an essential integrated assembly model for the plant mitogenome based on a graph-based framework. Brief Bioinform. 2023;24(1):bbac522.
Bi C, Shen F, Han F, Qu Y, Hou J, Xu K, et al. PMAT: An efficient plant mitogenome assembly toolkit using low-coverage HiFi sequencing data. Horticult Res. 2024;11(3):uhae023.
Palmer JD, Herbon LA. Plant mitochondrial DNA evolves rapidly in structure, but slowly in sequence. J Mol Evol. 1988;28(1–2):87–97.
Drouin G, Daoud H, Xia J. Relative rates of synonymous substitutions in the mitochondrial, chloroplast and nuclear genomes of seed plants. Mol Phylogenet Evol. 2008;49(3):827–31.
Mower JP, Touzet P, Gummow JS, Delph LF, Palmer JD. Extensive variation in synonymous substitution rates in mitochondrial genes of seed plants. BMC Evol Biol. 2007;7:1–14.
Zwonitzer KD, Tressel LG, Wu Z, Kan S, Broz AK, Mower JP, et al. Genome copy number predicts extreme evolutionary rate variation in plant mitochondrial DNA. Proc Natl Acad Sci. 2024;121(10):e2317240121.
Wu Z, Cuthbert JM, Taylor DR, Sloan DB. The massive mitochondrial genome of the angiosperm Silene noctiflora is evolving by gain or loss of entire chromosomes. Proc Natl Acad Sci. 2015;112(33):10185–91.
Wu Z, Sloan DB. Recombination and intraspecific polymorphism for the presence and absence of entire chromosomes in mitochondrial genomes. Heredity. 2018;122(5):647–59.
Christensen AC. Plant mitochondria are a riddle wrapped in a mystery inside an enigma. J Mol Evol. 2021;89(3):151–6.
Fan W, Liu F, Jia Q, Du H, Chen W, Ruan J, et al. Fragaria mitogenomes evolve rapidly in structure but slowly in sequence and incur frequent multinucleotide mutations mediated by microinversions. New Phytol. 2022;236(2):745–59.
Kan S, Liao X, Wu Z. The roles of mutation and selection acting on mitochondrial genomes inferred from intraspecific variation in seed plants. Genes. 2022;13(6):1036.
Wang J, Kan S, Kong J, Nie L, Fan W, Ren Y et al: Accumulation of large lineage-specific repeats coincides with sequence acceleration and structural rearrangement in Plantago plastomes. Genome Biol Evol. 2024:16(8):evae177.
Kleine T, Maier UG, Leister D. DNA transfer from organelles to the nucleus: the idiosyncratic genetics of endosymbiosis. Annu Rev Plant Biol. 2009;60(1):115–38.
Rice DW, Alverson AJ, Richardson AO, Young GJ, Sanchez Puerta MV, Munzinger J, et al. Horizontal transfer of entire genomes via mitochondrial fusion in the angiosperm Amborella. Science. 2013;342(6165):1468–73.
Zhang Z, Zhao J, Li J, Yao J, Wang B, Ma Y, et al. Evolutionary trajectory of organelle-derived nuclear DNAs in the Triticum/Aegilops complex species. Plant Physiol. 2024;194(2):918–35.
Chen ZJ, Scheffler BE, Dennis E, Triplett BA, Zhang T, Guo W, et al. Toward sequencing cotton (Gossypium) genomes. Plant Physiol. 2007;145(4):1303–10.
Wendel JF, Cronn RC. Polyploidy and the evolutionary history of cotton. Adv Agron. 2003;78(8):139–86.
Hu G, Grover CE, Jareczek J, Yuan D, Dong Y, Miller E et al.: Evolution and diversity of the cotton genome. In: Cotton precision breeding. Springer. 2021:25–78.
Viot CR, Wendel JF. Evolution of the cotton genus, Gossypium, and its domestication in the Americas. Crit Rev Plant Sci. 2023;42(1):1–33.
Chen ZJ, Sreedasyam A, Ando A, Song Q, De Santiago LM, Hulse Kemp AM, et al. Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement. Nat Genet. 2020;52(5):525–33.
Peng R, Xu Y, Tian S, Unver T, Liu Z, Zhou Z, et al. Evolutionary divergence of duplicated genomes in newly described allotetraploid cottons. Proc Natl Acad Sci. 2022;119(39):e2208496119.
Chen Z, Nie H, Wang Y, Pei H, Li S, Zhang L, Hua J. Rapid evolutionary divergence of diploid and allotetraploid Gossypium mitochondrial genomes. BMC Genomics. 2017;18:1–15.
Zhao N, Grover CE, Chen Z, Wendel JF, Hua J. Intergenomic gene transfer in diploid and allopolyploid Gossypium. BMC Plant Biol. 2019;19:1–18.
Chen Z, Grover CE, Li P, Wang Y, Nie H, Zhao Y, et al. Molecular evolution of the plastid genome during diversification of the cotton genus. Mol Phylogenet Evol. 2017;112:268–76.
Mower JP. Variation in protein gene and intron content among land plant mitogenomes. Mitochondrion. 2020;53:203–13.
Feng Y, Wang Y, Lu H, Li J, Akhter D, Liu F, et al. Assembly and phylogenomic analysis of cotton mitochondrial genomes provide insights into the history of cotton evolution. Crop J. 2023;11(6):1782–92.
Wolfe KH, Li WH, Sharp PM. Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc Natl Acad Sci. 1987;84(24):9054–8.
Wang D, Timmis JN. Cytoplasmic organelle DNA preferentially inserts into open chromatin. Genome Biol Evol. 2013;5(6):1060–4.
Michalovová M, Vyskot B, Kejnovsky E. Analysis of plastid and mitochondrial DNA insertions in the nucleus (NUPTs and NUMTs) of six plant species: Size, relative age and chromosomal localization. Heredity. 2013;111(4):314–20.
Matsuo M, Ito Y, Yamauchi R, Obokata J. The rice nuclear genome continuously integrates, shuffles, and eliminates the chloroplast genome to cause chloroplast-nuclear DNA flux. Plant Cell. 2005;17(3):665–75.
Adams KL, Ong HC, Palmer JD. Mitochondrial gene transfer in pieces: fission of the ribosomal protein gene rpl2 and partial or complete gene transfer to the nucleus. Mol Biol Evol. 2001;18(12):2289–97.
Kumar S, Suleski M, Craig JM, Kasprowicz AE, Sanderford M, Li M, et al. TimeTree 5: An Expanded Resource for Species Divergence Times. Mol Biol Evolution. 2022;39(8):msac174.
Wang M, Li J, Qi Z, Long Y, Pei L, Huang X, et al. Genomic innovation and regulatory rewiring during evolution of the cotton genus Gossypium. Nat Genet. 2022;54(12):1959–71.
Grewe F, Zhu A, Mower JP. Loss of a Trans-Splicing nad1 Intron from Geraniaceae and Transfer of the Maturase Gene matR to the Nucleus in Pelargonium. Genome Biol Evolution. 2016;8:3193–201.
Li H, Akella S, Engstler C, Omini JJ, Rodriguez M, Obata T, et al. Recurrent evolutionary switches of mitochondrial cytochrome c maturation systems in Archaeplastida. Nat Commun. 2024;15(1):1548.
Small R, Wendel J. Brief communication. The mitochondrial genome of allotetraploid cotton (Gossypium L.). J Heredity. 1999;90(1):251–3.
Wendel JF. New World tetraploid cottons contain Old World cytoplasm. Proc Natl Acad Sci USA. 1989;86(11):4132–6.
Wendel JF, Albert VA. Phylogenetics of the cotton genus (Gossypium): character-state weighted parsimony analysis of chloroplast-DNA restriction site data and its systematic and biogeographic implications. Syst Bot. 1992;17(1):115–43.
Yan XL, Kan SL, Wang MX, Li YY, Tembrock LR, He WC, et al. Genetic diversity and evolution of the plastome in allotetraploid cotton (Gossypium spp. J Syst Evol. 2024;62(6):1118–36.
Cronn R, Wendel JF. Cryptic trysts, genomic mergers, and plant speciation. New Phytol. 2004;161(1):133–42.
Cronn RC, Small RL, Haselkorn T, Wendel JF. Rapid diversification of the cotton genus (Gossypium: Malvaceae) revealed by analysis of sixteen nuclear and chloroplast genes. Am J Bot. 2002;89(4):707–25.
Grover CE, Gallagher JP, Jareczek JJ, Page JT, Udall JA, Gore MA, Wendel JF. Re-evaluating the phylogeny of allopolyploid Gossypium L. Mol Phylogenet Evol. 2015;92:45–52.
Wendel J, Cronn R. Polyploidy and the evolutionary history of cotton. 2003.
Wang H, Wu Z, Li T, Zhao J. Highly active repeat-mediated recombination in the mitogenome of the aquatic grass Hygroryza aristata. BMC Plant Biol. 2024;24(1):644.
Sun M, Zhang M, Chen X, Liu Y, Liu B, Li J, et al. Rearrangement and domestication as drivers of Rosaceae mitogenome plasticity. BMC Biol. 2022;20(1):181.
Gandini CL, Garcia LE, Abbona CC, Ceriotti LF, Kushnir S, Geelen D, Sanchez-Puerta MV. Break-induced replication is the primary recombination pathway in plant somatic hybrid mitochondria: a model for mitochondrial horizontal gene transfer. J Exp Bot. 2023;74(12):3503–17.
Gualberto JM, Newton KJ. Plant mitochondrial genomes: dynamics and mechanisms of mutation. Annu Rev Plant Biol. 2017;68(1):225–52.
Wang J, Zou Y, Mower JP, Reeve W, Wu Z. Rethinking the mutation hypotheses of plant organellar DNA. Genomics Communications. 2024;1(1):e003.
Zhu W, Qian J, Hou Y, Tembrock LR, Nie L, Hsu YF, et al. The evolutionarily diverged single-stranded DNA-binding proteins SSB1/SSB2 differentially affect the replication, recombination and mutation of organellar genomes in Arabidopsis thaliana. Plant Diversity. 2024. https://doi.org/10.1016/j.pld.2024.11.001.
Christensen AC. Plant Mitochondrial Genome Evolution Can Be Explained by DNA Repair Mechanisms. Genome Biol Evol. 2013;5(6):1079–86.
Christensen AC. Genes and Junk in Plant Mitochondria—Repair Mechanisms and Selection. Genome Biol Evol. 2014;6(6):1448–53.
Kimura M. Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution. Nature. 1977;267(5608):275–6.
Wu Z, Waneka G, Sloan DB. The tempo and mode of angiosperm mitochondrial genome divergence inferred from intraspecific variation in Arabidopsis thaliana. G3: Genes | Genomes |. Genetics. 2020;10(3):1077–86.
Wendel JF, Grover CE. Taxonomy and evolution of the cotton genus. Gossypium Cotton. 2015;57:25–44.
Rieseberg LH, Soltis DE. Phylogenetic consequences of cytoplasmic gene flow in plants. Evol Trends Plants. 1991;5(1):65–84.
Acosta MC, Premoli AC. Evidence of chloroplast capture in south American Nothofagus (subgenus Nothofagus, Nothofagaceae). Mol Phylogenet Evol. 2010;54(1):235–42.
Postel Z, Sloan DB, Gallina S, Godé C, Schmitt E, Mangenot S, et al. The decoupled evolution of the organellar genomes of Silene nutans leads to distinct roles in the speciation process. New Phytol. 2023;239(2):766–77.
Richly E, Leister D. NUMTs in sequenced eukaryotic genomes. Mol Biol Evol. 2004;21(6):1081–4.
Chen Y, Guo Y, Xie X, Wang Z, Miao L, Yang Z, et al. Pangenome-based trajectories of intracellular gene transfers in Poaceae unveil high cumulation in Triticeae. Plant Physiol. 2023;193(1):578–94.
Jin JJ, Yu WB, Yang JB, Song Y, DePamphilis CW, Yi TS, Li DZ. GetOrganelle: A fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 2020;21:1–31.
Wick RR, Schultz MB, Zobel J, Holt KE. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics. 2015;31(20):3350–2.
Li H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100.
Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol. 2019;37(5):540–6.
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, et al. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE. 2014;9(11):e112963.
Shen W, Le S, Li Y, Hu F. SeqKit: A cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE. 2016;11(10):e0163962.
Ye J, McGinnis S, Madden TL. BLAST: improvements for better sequence analysis. Nucleic Acid Res. 2006;34(suppl_):W6–9.
Tillich M, Lehwark P, Pellizzer T, Ulbricht Jones ES, Fischer A, Bock R, Greiner S. GeSeq – versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 2017;45(W1):W6–11.
Greiner S, Lehwark P, Bock R. OrganellarGenomeDRAW (OGDRAW) version 1.3. 1: Expanded toolkit for the graphical visualization of organellar genomes. Nucleic Acids Res. 2019;47(W1):W59–64.
Orme CDL. Caper: Comparative analyses of phylogenetics and evolution in R. Methods Ecol Evol. 2013;3:145.
Page AJ, Taylor B, Delaney AJ, Soares J, Seemann T, Keane JA, Harris SR. SNP-sites: Rapid efficient extraction of SNPs from multi-FASTA alignments. Microbial Genom. 2016;2(4):e000056.
Rozas J, Ferrer Mata A, Sánchez DelBarrio JC, Guirao-Rico S, Librado P, Ramos Onsins SE, Sánchez Gracia A. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol Biol Evol. 2017;34(12):3299–302.
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3.
Yang Z. PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586–91.
Louca S, Doebeli M. Efficient comparative phylogenetics on large trees. Bioinformatics. 2018;34(6):1053–5.
Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26(5):589–95.
Acknowledgements
We sincerely thank members of Wu’s Lab for prompt and patient assistance and all cooperators for valuable discussion and edits during manuscript preparing. We are also particularly grateful for the services of the High-Performance Computing Cluster in the Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences.
Funding
This work was funded by the National Natural Science Foundation of China (grant 32170238), the Guangdong Pearl River Talent Program (grant 2021QN02N792), the Shenzhen Fundamental Research Program (Grant No. JCYJ20220818103212025), the Chinese Academy of Agricultural Sciences Elite Youth Program (110243160001007) to Z.W. And this work was also supported by Innovation Program of Chinese Academy of Agricultural Sciences, the Shenzhen Key Laboratory of Southern Subtropical Plant Diversity.
Author information
Authors and Affiliations
Contributions
ZW, SK, JFW and XM conceived the project and designed the research; JK, JW, and LN performed assembly of the genome sequences, analyzed data, and resulting interpretation; JK, JW, and LN wrote the manuscript; ZW, SK, XM, JFW, CZ and LRT assisted in discussion and editing the manuscript. All authors approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
12915_2025_2115_MOESM1_ESM.pdf
Additional file 1: Figure S1. Mitogenome gene content and plastome structure of Gossypium. Figure S2. Synonymous sites ratio of ptDNA/mtDNA in CDS and nucleotide substitution ratio of ptDNA/mtDNA in IGS. Figure S3. Genomic landscape of NUPTs and NUMTs in the Gossypium. Figure S4. The length of young and ancient NUOTS and the distribution. Figure S5. The distribution of young and ancient NUMTs and NUPTs in the A-D subgenome nuclear chromosomes of AD1 AD2 A2 and D5. Figure S6. Nuclear genome location preference of organellar genome transfer sequences in Gossypium. Figure S7. Correlation between gene size and number of NUPGs and the number distribution of NUPGs. Figure S8. Mitogenome structure variation of Gossypium with deletion sequence information.
12915_2025_2115_MOESM2_ESM.xlsx
Additional file 2: Table S1. Characterization of mitogenome assembly in Gossypium. Table S2. Features of mitogenome. Table S3. Features of the plastomes. Table S4. Mitogenome repeat information for Gossypium. Table S5. Nucleotide variation of mitogenome on 44 core fragments. Table S6. Estimation of absolute substitution rates in the Gossypium organellar genomes. Table S7. Nuclear mitochondrial DNA on each chromosome of Gossypium tomentosum. Table S8. Estimation of absolute substitution rates in the Gossypium organellar genomes. Table S9. The number of young and ancient NUMTs/NUPTs in seven fragment length types among Gossypium species. Table S10. The number of NUPGs among 18 samples. Table S11. Downloaded cotton variety data information description list. Table S12. Accession numbers of sequence data are used in this study.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Kong, J., Wang, J., Nie, L. et al. Evolutionary dynamics of mitochondrial genomes and intracellular transfers among diploid and allopolyploid cotton species. BMC Biol 23, 9 (2025). https://doi.org/10.1186/s12915-025-02115-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12915-025-02115-z





