Skip to main content

Rearrangement and domestication as drivers of Rosaceae mitogenome plasticity



The mitochondrion is an important cellular component in plants and that functions in producing vital energy for the cell. However, the evolution and structure of mitochondrial genomes (mitogenomes) remain unclear in the Rosaceae family. In this study, we assembled 34 Rosaceae mitogenomes and characterized genome variation, rearrangement rate, and selection signal variation within these mitogenomes.


Comparative analysis of six genera from the Amygdaloideae and five from the Rosoideae subfamilies of Rosaceae revealed that three protein-coding genes were absent from the mitogenomes of five Rosoideae genera. Positive correlations between genome size and repeat content were identified in 38 Rosaceae mitogenomes. Twenty repeats with high recombination frequency (> 50%) provided evidence for predominant substoichiometric conformation of the mitogenomes. Variations in rearrangement rates were identified between eleven genera, and within the Pyrus, Malus, Prunus, and Fragaria genera. Based on population data, phylogenetic inferences from Pyrus mitogenomes supported two distinct maternal lineages of Asian cultivated pears. A Pyrus-specific deletion (DEL-D) in selective sweeps was identified based on the assembled genomes and population data. After the DEL-D sequence fragments originally arose, they may have experienced a subsequent doubling event via homologous recombination and sequence transfer in the Amygdaloideae; afterwards, this variant sequence may have significantly expanded to cultivated groups, thereby improving adaptation during the domestication process.


This study characterizes the variations in gene content, genome size, rearrangement rate, and the impact of domestication in Rosaceae mitogenomes and provides insights into their structural variation patterns and phylogenetic relationships.


As the cell’s energy factory, the mitochondrion is an organelle essential in angiosperm development, growth, programmed cell death, and male sterility [1]. Each mitochondrion has its own genome, which is usually uniparentally inherited [2]. Compared with plastid genomes, angiosperm mitogenomes vary in size and gene content [3, 4]. Currently known angiosperm mitochondrial genome (mitogenome) sizes range from 66 kb to 11.3 Mb, and the number of protein-coding genes ranges from 19 to 41 (excluding duplicated genes and open reading frames (ORFs)) [4,5,6]. Most genome size and structure variations occur in non-coding sequences, and these variations are primarily caused by foreign sequence importation, which increases the occurrence of repetitive sequences and recombination events [7,8,9].

Numerous inverted and direct repeats play a pivotal role in plant mitogenome size and structural evolution by participating in genome rearrangement, repeat-mediated recombination, insertion, and deletion events [8, 10, 11]. Repeat-mediated homologous recombination in mitogenomes has been investigated in angiosperm plants such as Picea abies [12] and Nymphaea colorata [13], and positive correlations between repeat length and recombination rate were detected in Viscum scurruloideum [4]. Minor to moderate recombination activity was detected among short (< 100 bp) and medium length repeats (100–1000 bp), while larger repeats (> 1000 bp) experienced more frequent recombination activity and isomerization in the genome [14, 15]. Recently, third-generation long-read sequencing technologies have been used to overcome the complexity of short read-based genome assembly, and this technology has proven sensitive at detecting the repeat-mediated recombination activity of large repeats [12, 13].

Mitogenome rearrangement is primarily caused by frequent repeat-mediated recombination [11], supported by the presence of rearrangement breakpoints close to repeats [8]. In plants, mitogenome rearrangements can influence ATP availability, plant growth, cytoplasmic male sterility (CMS), and overall fitness [16, 17]. Aside from the low substitution rate, mitogenomes in angiosperms have obviously different rearrangement rates. Within the genus Monsonia, the mitogenome of M. ciliata has a tenfold higher rearrangement rate than its sister species; overall, an over 600-fold variance in mitogenome rearrangement rates has been observed among seed plants [8].

Rosaceae has ca. 3000 species in 90 genera and includes herbs, shrubs, and trees adapted to a wide variety of environments [18]. Research on Rosaceae mitogenomes has remained limited despite recent progress in nuclear and chloroplast genomic sequencing of Rosaceae species [19,20,21]. Only 14 Rosaceae mitogenomes were available in the National Center for Biotechnology Information (NCBI) database (last access date: 20 January 2022) (Additional file 1). The evolution and divergence of Rosaceae mitogenomes remain unclear, and limited genetic information regarding Rosaceae mitogenome analysis exists. Third-generation long-read sequencing technologies and a series of assembly software like GetOrganelle [22], SOAPdenovo [23], and Canu [24] have provided the ability to assemble complete mitogenomes. In addition, many Rosaceae nuclear genomes can only employ different parental inheritance modes, such as in the Malus, Pyrus, and Sorbus genera; information on mitogenomes inherited from the maternal parent provides a chance to determine additional population and domestication information [25, 26].

Here, 38 complete Rosaceae mitogenomes were assembled and annotated (four of which were previously released). Variations in genes and repeat sequences were identified, and recombination and rearrangement events were investigated to explore the expansion and evolution patterns of Rosaceae mitogenomes. Subsequently, short-read sequencing data from 139 pear and 116 apple accessions was used to explore the genetic variations, phylogenetic relationships, and domestication processes in the Pyrus and Malus genera. We found that domestication and selection contributed to the variations in the mitogenomes of members of the Rosaceae family and resulted in the spread of structurally varied gene sequences.


Profile of the mitogenomes of Rosaceae species

In this study, each of the 34 Rosaceae mitogenomes was de novo assembled into a single completely gapless contig with an average coverage depth of 323.52–6550.87× (Additional file 2). Coupled with the four previously released genomes, the sizes of the 38 Rosaceae mitogenomes ranged from 277.76 kb (Rosa chinensis: Rochi) to 535.73 kb (Prunus mume: Pmum) (Table 1). Genome sizes within the Amygdaloideae varied by up to 150.75 kb (Sorbus aucuparia: Sauc (384.98 kb) vs Pmum (535.73 kb)), and this variation increased up to 194.38 kb in Rosoideae species (Table 1). Twenty-four genes appeared in all 38 mitogenomes. Compared with six genera in Amygdaloideae, three protein-coding genes (rpl5, rpl16, and sdh3) were completely lost from the mitogenomes of five genera of the Rosoideae (Fig. 1). Within Rosoideae, rps14 was lost in Rosa, Geum, Potentilla, and Fragaria, and rps12 was lost in Geum, Potentilla, and Fragaria. Rpl10 was lost in Fragaria, and rps1 was lost in Rosa and Potentilla. In six Amygdaloideae genera, rps7 was lost in Sorbus, Photinia, Malus, Eriobotrya, and Pyrus. Two varieties of Pyrus bretschneideri (“Yali”: Pbre-Y and “Dangshansuli” Pbre-D) contained copies of the atp9 and ccmB genes. Malus sylvestris (Msyl) and Malus domestica (“Gala”: Mdom-G; “Yantai fuji 8”: Mdom-Y) contained copies of the 26rrn and rps12 genes. The GC content was relatively stable, averaging at about 45% in the 38 Rosaceae mitogenomes, except for the Rubus chingii (Ruchi) mitogenome (43.31%), which had a higher percent of chloroplast sequence imports relative to other species (Table 1).

Table 1 Summary of 38 Rosaceae mitogenomes
Fig. 1
figure 1

Gene content of 38 mitogenomes. Different colors represent the number of genes present in each mitogenome: Fiin, Fragaria iinumae; Fpen, Fragaria pentaphylla; Fvir, Fragaria viridis; Fman, Fragaria mandschurica; Fnil, Fragaria nilgerrensis; Fves, Fragaria vesca; Fana-C, Fragaria ananassa cv. “Camarosa”; Fana-R, Fragaria ananassa cv. “Royal Royce”; Rochi: Rosa chinensis; Rorug: Rosa rugosa; Pans, Potentilla anserina; Ruchi, Rubus chingii; Mbac, Malus baccata; Mdom-G, Malus domestica cv. “Gala”; Mdom-Y, Malus domestica cv. “Yantai fuji 8”; Msie, Malus sieversii; Msyl, Malus sylvestris; Pyed, Prunus yedoensis; Pkan, Prunus kanzakura; Pmir, Prunus mira; Pmum, Prunus mume; Psib, Prunus sibirica; Pavi-G, Prunus avium cv. “Glory”; Pavi-S, P. avium cv. “Staccato”; Parm, Prunus armeniaca; Psal, Prunus salicina; Pbet, Pyrus betulifolia; Pbre-D, Pyrus bretschneideri cv. “Dangshansuli”; Pbre-Y, Pyrus bretschneideri cv. “Yali”; Pcom, Pyrus communis; Pysb, Pyrus sinkiangensis × bretschneideri; Pyuc, Pyrus ussuriensis × communis; Pser, Photinia serratifolia; Gurb, Geum urbanum; Stor, Sorbus torminalis; Spoh, Sorbus pohuashanensis; Sauc, Sorbus aucuparia; Ejap, Eriobotrya japonica. “*” at the bottom of the heatmap means these protein-coding genes appeared in all 38 mitogenomes. The right color bar represents the genus of each of the 38 mitogenomes

Repeat sequence variation and correlations between genome size and repeat sequences

In the 38 mitogenomes, total repeat number changes might be caused by short (< 100 bp) repeat sequences (Fig. 2a; Additional file 3). The number of repeat sequences ranged from 112 (Fragaria ananassa cv. “Camarosa”: Fana-C) to 457 (Pmum), and 73.33–90.98% of repeats were less than 100 bp in length (Fig. 2a; Additional file 3). Among species of the Amygdaloideae, the number of short (< 100 bp) repeats in Prunus samples was significantly higher than samples from five other genera (Photinia, Malus, Pyrus, Sorbus, and Prunus) (t-test, P-value = 5.64e−8), while the number of repeats longer than 100 bp was not significantly increased (t-test, P-value = 0.11) (Additional file 3). In Rosoideae, the total repeat number (296) of Ruchi was higher than that of samples from four genera (total repeat number: 112–150) (Fig. 2a), but the total repeat length in Ruchi was lower than in Geum urbanum (Gurb) (Fig. 2b).

Fig. 2
figure 2

Repeat content and repeat-mediated homologous recombination among Rosaceae mitogenomes. Number of total repeats (a) and total repeat length (b) of 38 Rosaceae mitogenomes. Red indicates repeat lengths were > 1000 bp; green indicates repeat lengths ranged from 501 to 1000 bp; pink indicates repeat lengths ranged from 100 to 500 bp; blue indicates repeat lengths were < 100 bp. cf The correlation between genome size and total repeat number (c), total repeat length (d), total repeat number (repeat length ≤ 500 bp) (e), and total repeat length (repeat length ≤ 500 bp) (f) in 38 Rosaceae mitogenomes. R2 indicates the coefficient of determination, and the P-value was determined by F-statistic. g Repeat-mediated recombination in 33 mitogenomes of ten genera. Each point represents a pair of repeats, and the y-axis represents the proportion of mapping to expected recombination products. R indicates the correlation coefficient, and the P-value was determined by a two-tailed Student’s t-test. h Distribution of recombination frequencies of the four repeat types based on repeat length (< 100 bp, 100 bp ≤ repeat length ≤ 500 bp, 501 bp ≤ repeat length ≤ 1000 bp, and repeat length > 1000 bp)

For all of the 38 Rosaceae samples, genome size showed a significantly high correlation with repeat number (phylogenetic generalized least squares: PGLS, R2adj = 0.35, P-value = 5.27e−5) (Fig. 2c). In addition, mitogenome size showed significantly high (P-value < 0.01) correlations with total repeat number and length (repeat length ≤ 500 bp) (Fig. 2e, f; Additional file 4: Fig. S1 a, b, e, f). However, negligible correlations (R2adj = − 0.02, P-value = 0.51) appeared between total repeat length and genome size (Fig. 2d), and repeats longer than 1000 bp also showed low correlation with genome size (R2adj = − 0.02 and − 0.03) (Additional file 4: Fig. S1d, h). In 14 Fabaceae mitogenomes, genome size also showed high correlation with repeat sequences (total repeat number: R2adj = 0.73, P-value < 1.00e−3; total repeat length: R2adj = 0.68, P-value < 1.00e−3) (Additional file 4: Fig. S2a, g). All three repeat categories (length < 100 bp, 100 bp ≤ repeat length ≤ 500 bp, and length ≤ 500 bp) showed significant correlation with variations in genome size (R2adj ranged from 0.67 to 0.90, P-value < 1.00e−3) (Additional file 4: Fig. S2b, c, f, h, i, l).

However, in regard to the total number of repeats, repeats shorter than 100 bp or 500 bp showed negligible correlations (R2adj = 0.10, 0.08, and 0.11) with genome size in 88 (one sample per species) seed plant mitogenomes (Additional file 4: Fig. S3a-c), and repeats > 500 bp showed high correlations with genome size (Additional file 4: Fig. S3d, e). Total repeat length (except for length > 1000 bp) showed significant correlation (R2adj ranged from 0.45 to 0.51, P-value < 1.00e−3) with genome size (Additional file 4: Fig. S3g-j, l). In addition, several mitogenomes may have an overrepresentation of repeat content (Additional file 5). For example, Hyoscyamus niger (total repeat length: 133.17 kb) had a similar genome size to Prunus salicina (Psal) (501.40 vs. 508.00 kb), but the total repeat length was about 3.97-fold of Psal (33.56 kb). Additionally, 50 mitogenomes, with genome sizes ranging from 271.60 to 525.67 kb, were selected, and 38.49- and 80.47-fold changes in total repeat length and numbers of repeat sequences were identified (Additional file 4: Fig. S4; Additional file 5).

Recombination of repetitive sequences

Based on long sequencing reads of 33 samples in ten genera, repeat length showed a relatively high correlation with recombination frequency (Pearson correlation coefficient: R = 0.60, P-value < 2.2e−16) (Fig. 2g). Higher recombination frequencies were observed for long (> 1000 bp) repeats than for medium (100–500 bp and 501–1000 bp) or short (< 100 bp) repeats (Fig. 2h), and percentages of long repeats associated with homologous recombination were higher than that of short and medium repeats (Table 2). A total of 341 recombination events were identified, and 1–35 recombination events appeared in each of the 33 mitogenomes (Additional file 6). Among short repeats (< 100 bp), only 2.45% (164/6707), underwent homologous recombination, but this percentage increased to 86.84% (33/38) for the long repeats (> 1000 bp) (Table 2). Among the 341 repeats exhibiting recombination activity, 81.82% (27/33) of the long repeats recombined with a frequency greater than 20%, and 83.54% (137/164) of the short repeats had recombination frequencies lower than 1%. Twenty repeats had over 50% recombination frequency (Additional file 6). In Pyrus, a repeat of 2040 bp length and 25.31% recombination frequency in Pbre-Y exhibited 77.61–79.37% recombination frequency in “Hongxiangsuli” (Pyrus sinkiangensis × bretschneideri: Pysb), Pyrus betulifolia (Pbet), and “No.1 Zhong’ai” (Pyrus ussuriensis × communis: Pyuc). In Pyrus communis (Pcom), this repeat was shortened to 1841 bp, and the recombination frequency reached 89.69% (Additional file 6).

Table 2 Recombination statistics on four types of repeats among 33 mitogenomes

Rearrangement rates of the 38 mitogenomes

Repeat-mediated recombination may further contribute to the rearrangement of mitogenomes [8]. In this study, eleven mitogenomes (Prunus mira: Pmir, Fragaria vesca: Fves, Eriobotrya japonica: Ejap, Pbet, Ruchi, Rosa rugosa: Rorug, Potentilla anserina: Pans, Photinia serratifolia: Pser, Geum urbanum: Gurb, Sorbus pohuashanensis: Spoh and Malus sieversii: Msie) were chosen to represent eleven genera. About 13.97–22.87% of the mitochondrial sequences were shared among all eleven genera, and more than 29 rearrangements were identified between the Amygdaloideae and Rosoideae subfamilies (Fig. 3a). Twenty-one to 33 rearrangement events were identified in five Rosoideae genera, and 5–20 rearrangement events were identified in six Amygdaloideae genera. The rearrangements were then evaluated within each genus to avoid complications resulting from the high sequence divergence between genera [12]. In Malus, 91.76–95.31% of sequences were shared (Fig. 3c), and seven to nine rearrangement events were identified between Malus baccata (Mabc) and the other four apples. One or two rearrangements were detected between the remaining four accessions (Msie, Msyl, Mdom-Y, and Mdom-G). About 88.97–94.40% of sequences appeared within the six pears (Fig. 3d), and four to seven rearrangements were identified between Pcom and the other five Asian pears. Among the four Asian cultivated pears studied, six rearrangement events were identified between the Pysb and Pyuc and the Pbre-D and Pbre-Y species. Unexpectedly, no rearrangement events were detected between Pysb and Pyuc, despite their maternal parents coming from two pear systems (Pyrus sinkiangensis and Pyrus ussuriensis) (Fig. 3d). In Prunus, only 50.97–65.11% of sequences were shared, and 0–26 rearrangements were identified (Fig. 3e). In Fragaria, 66.89–77.20% homologous sequences and 0–17 rearrangements were identified (Fig. 3f).

Fig. 3
figure 3

Rearrangement event and rate analysis. a Number and rate of pair-wise rearrangement events between eleven genera. Black numbers represent the rearrangement events (pair-wise rearrangement events), and red numbers represent the pair-wise rearrangement rates. b Rearrangement rates in eleven genera. The numbers of rearrangement events per million years are displayed on branches of the phylogeny. cf The pair-wise analysis of rearrangement events and rate within Malus (c), Pyrus (d), Prunus (e), and Fragaria (f). Upper-right heatmaps represent the pair-wise rearrangement rates. Red numbers represent the rearrangement rates, and black numbers represent the rearrangement events. Bottom-left figures display the synteny analysis between two samples. Red represents direct, and blue represents inverted. Numbers under the sample ID represent the percent of shared sequences

Furthermore, obvious variations in rearrangement rate were identified at both the inter-genus (Fig. 3a, b) and intra-genus (Fig. 3c–f, Additional file 4: Fig. S5) levels. An over 10-fold variation in rearrangement rate (0.16–2.80) occurred among eleven genera, seven of which were lower than one (from 0.16 to 0.84) (Fig. 3b). The estimated common ancestor of Ejap, Msie, Pser, Pbet, and Spoh had a rearrangement rate as low as 0.18 after divergence with Prunus, which increased to 0.24 in Ejap, 1.14 in Pser, 2.08 in Msie, 2.45 in Pbet, and 2.80 in Spoh. Within Malus, the highest rearrangement rate (7.69 rearrangement events per million years ago (Mya)) was identified in the divergence between Malus baccata (Mbac) and the other three species (Additional file 4: Fig. S5a), and the pair-wise rearrangement rates (4.32 to 5.56) between Mbac and the other four samples (Fig 3c) were higher than the others (from 0 to 2.00). In Pyrus, 2.88 rearrangement events/Mya were identified in Pcom (Additional file 4: Fig. S5b), and six rearrangement events which occurred at 0.05 Mya resulted in an extremely high rearrangement rate (120 rearrangement events/Mya) which experienced an increase in the pair-wise rearrangement rate between Pysb and Pbre-D (Fig. 3d). Variations in the rearrangement rate were also identified in Prunus and Fragaria (Fig. 3e, f; Additional file 4: Fig. S5c, d). Nine rearrangement events which occurred about 1.73 Mya in Prunus avium (“Glory”: Pavi-G and “Staccato”: Pavi-S) resulted in a higher rearrangement rate than other species in Prunus (Additional file 4: Fig. S5c). Two Fragaria wild species (Fragaria mandschurica: Fman and Fves) experienced a threefold greater increase in rearrangement rate (10.52) than the other wild Fragaria species (0–3.35), and the rearrangement rate of Fragaria ananassa (“Royal Royce”: Fana-R and Fana-C) was 6.42 (Additional file 4: Fig. S5d).

Mitogenomes reveal pear maternal phylogeny

The nuclear genome of pears is composed of biparental genetic background due to its self-incompatibility [21]. Compared with the nuclear genome phylogeny, the mitogenome phylogeny reveals the maternal relationship between different pear species. DNA re-sequencing data from 139 pear accessions were mapped to the “Dangshansuli” mitogenome (Additional file 7) to generate a SNP-based matrix, which included 85 Asian (52 cultivated and 33 wild) and 54 European (29 cultivated and 25 wild) pears. Our phylogenetic analysis of the associated mitogenomes revealed two groups, Asian and European pears (Fig. 4a). Among the Asian clade, three subclades were further subdivided, namely clades 1 and 3, which consisted of most of the Asian cultivated pear accessions, while clade 2 contained the wild Asian pear accessions. Cultivars of Pyrus pyrifolia, P. ussuriensis, and P. bretschneideri were mixed in clades 1 and 3. Four P. sinkiangensis cultivars clustered in the European group and one in the Asian group. Consistently, PCA (Fig. 4b) and structural analysis (Fig. 4c) also showed that Asian cultivated pears were divided into two groups.

Fig. 4
figure 4

Population structure of 139 domesticated and wild pears. a Maximum likelihood phylogenetic tree estimated from SNPs with maf > 0.01 and max missing rate < 0.1. (1) The labels are colored with orange and blue, representing European and Asian pear accessions, respectively. (2) Each pear group is color coded. b Principal component analysis (PCA) of the 139 pear accessions. c Bayesian model-based clustering of the 139 pear accessions with the number of ancestry kinship (K) ranging from 2 to 6. Each vertical bar represents one pear accession, and the x-axis shows the different pear accessions. Each color represents one putative ancestral background, and the y-axis quantifies the ancestry membership. Aw, Asian wild pears; Ew, European wild pears; Ppy, Pyrus pyrifolia; Pus, Pyrus ussuriensis; Pbr, Pyrus bretschneideri; Psi, Pyrus sinkiangensis; Pco, Pyrus communis

Identification of selective sweeps and divergent deletion types in mitogenomes

In 139 pear accessions, 1046 SNPs and 118 INDELs were identified (Fig. 5, Table 3), with only 95 SNPs (9.08%, Additional file 8) and two INDELs (1.69%, Additional file 9) being located in genes. To identify the specific regions under selection, selective sweeps were identified based on the diversity of the pear mitogenomes (Fig. 6a, b). For Asian pears, 5.88% (27.00 kb/458.90 kb) of the regions showed selective sweep signatures containing four protein-coding genes and one tRNA (Additional file 10). For European pears, there were selective sweep signatures for 2.18% (10.00 kb/458.90 kb) of sequences, which contained three protein-coding genes. No overlapping selective sweeps were detected between Asian and European pears based on the mitogenomes.

Fig. 5
figure 5

The distribution of SNPs and INDELs across mitogenomes in different groups. a The whole mitogenome of “Dangshansuli.” Total variants detected among b 139 pear accessions, c Asian wild pear population, d Asian cultivated pear population, e European wild pear population, and f European cultivated pear population

Table 3 Summary of the SNPs and INDELs in 139 pear accessions
Fig. 6
figure 6

Deletion analysis of the mitogenomes of pear and apple. a Distribution of FST values across the whole Asian and European pear populations. b Distribution of πwild/πcul ratios; cul means cultivated pear population; dotted regions represent the deletions (Pbre-D: from 183,739 to 199,800 bp) in P. betulifolia. c A deletion (DEL-D) was identified in P. betulifolia. The deletion sequence is further divided into three parts (Del1, Del2, and Del3). d, e Appearance times of Del1 (d) and Del3 (e) in the four pear groups. Red represents the samples not containing the deletion, and blue represents the samples containing the deletion. AC, Asian cultivated group; AW, Asian wild group; EC, European cultivated group; EW, European wild group. f Geographical distribution of 139 pear accessions. The main distribution areas are marked by circles. Blue indicates pears with the deletion, and red indicates pears without deletion. Triangles represent wild pears, and circles represent cultivated pears. g A deletion (DEL-M) is identified in M. sieversii which did not appear in M. sylvestris and M. domestica cv. “Gala.” This deletion is homologous with a sequence (185,690–192,355 bp) of “Dangshansuli.” h Appearance times of DEL-M in the four apple groups. Red represents the samples not containing the deletion, and blue represents the samples containing the deletion. Sie, M. sieversii; AW, Asian wild group; Dom, cultivated group; EW, European wild group. i Homologous sequences of Del1, Del2, and Del3 in 30 mitogenomes and nuclear genomes. The gray bar indicates no information was available. ORFs were identified using ORFfinder with a minimum length of 150 bp. j Putative dynamics of evolution and expansion of the deletion sequence during the divergence of Rosaceae species. AWp, Asian wild pear; ACp, Asian cultivated pears; Ep, European pears; AWm, Asian wild apples; ECm, European cultivated apples; EWm, European wild apples. P-values were determined by a two-tailed Student’s t-test (**P-value < 0.01)

One continuous region from 185 to 190 kb showed a selective sweep signature in Asian pears, and P. betulifolia had deletions in this region (DEL-D, Pbre-D: 183.74–199.80 kb) (Fig. 6c). DEL-D was divided into three parts (Del1, Del2, and Del3); Del1 and Del3 were mitochondrial-specific sequences, and Del2 was similar to the chloroplast genome sequence (100% BLASTN identity). Therefore, we only analyzed the frequency of Del1 and Del3 in the four pears groups. Sixty-six percent (22/33) of Asian wild pears contained Del1, and the frequency was significantly (chi-square test, P-value = 9.39e−15) higher than Asian cultivars (1.92%) (Fig. 6d). However, this divergence did not appear in European pears, and 92% of European wild pears and 100% of European cultivated pears did not contain Del1. This phenomenon also appeared in Del3, for which a significantly different frequency (chi-square test, P-value = 3.11e−16) was observed between Asian wild and cultivated pears (Fig. 6e). As pears spread to the Middle East and Europe, most European wild and cultivated pears did not contain the Del1 (Fig. 6f).

A deletion (DEL-M) (Malus domestica cv. “Gala”: 180,287–186,952 bp), in a part of Del1, was also identified in M. sieversii (Fig. 6g), and DEL-M showed significantly (chi-square test, P-value < 0.01) different frequencies between wild and cultivated apples (Fig. 6h, Fig. S6a), based on the re-sequencing data of 116 apple accessions. Eighty percent of apples in the European wild (EW) group and 71.43% in the M. domestica (Dom) group did not contain DEL-M. Based on the apple distribution (Additional file 4: Fig. S6b), the M. sieversii group was divided into the Sie_X (cultivated in the east of TianShan) and Sie_K (cultivated in the west of Tianshan) groups. One hundred percent of Sie_X and 97.22% of Asian wild (AW) apples contained DEL-M.

Among the Rosaceae mitogenomes, large sequence fragments of DEL-D firstly appeared in Amygdaloideae and then expanded into Malus and Pyrus (Fig. 6i; Additional file 11). Compared with Rosoideae, large fragments (> 1 kb) of Del1 were identified in Amygdaloideae mitogenomes. Lengths of 1589–4201 bp of Prunus mitogenomes were mapped to Del1. A total of 8519 bp of sequence in the Pser mitogenome could be mapped to Del1. In Malus, a total of 6951 bp of sequence in the Mdom-G, Mdom-Y, and Msyl could be mapped to Del1. Spoh also contained 6133 bp of sequence mapping to Del1. More than 4000 bp of Del3 sequence was identified from the Pbre-D, Pbre-Y, Pcom, Mbac, Sauc, and Stor mitogenomes. Only 681 bp of Del3 was identified from Pysb and Pyuc, and 599–602 bp of mitogenome sequences of Prunus was mapped to Del3. Furthermore, the nuclear sequence mapping results showed that less than 10% of DEL-D sequences were shared with the nuclear sequences in Fragaria, Rubus, and Rosa, but this percentage increased in Prunus (5.69–44.06%), Malus (22.99–55.60%), and Pyrus (21.04–33.15%) (Additional file 11).


Gene loss and genome variation in 38 mitogenomes of Rosaceae

Mitogenomes have variable gene content [27] and genome structure [11]. Thirty-eight mitogenomes from members of the Rosaceae were assembled and annotated to characterize the variations. Consistent with Fabaceae [9] and Poaceae [28], gene loss appeared before Rosaceae speciation, and the loss of rpl2, rps10, rps11, rps19, and rps2 may have occurred in an ancestor of Rosaceae. In addition, rps2 and rps11 were lost in all eudicots [27], indicating the more ancient losses of genes rpl2, rps10, and rps19. Within Rosaceae, rpl16, sdh3, and rpl5 were absent in five genera of Rosoideae, which represent shrub and herb species, and rps12 was lost in three herb genera (Geum, Fragaria, and Potentilla) (Fig. 1). These gene losses might affect the translocation and splicing of mitochondrial genes [29] and further influence plant development, reproduction, and other morphological and physiological traits, such as stunting in maize [30], distorted leaves in Arabidopsis [31], stress responses in Oryza sativa [32], and the parasitic lifestyle of V. scurruloideum [4].

In this study, the genome sizes of the 38 mitogenomes were highly correlated with short (< 100 bp) and medium (100 bp ≤ repeat length ≤ 500 bp) repeat lengths (Fig. 2e, f; Additional file 4: Fig. S2a, b, e, f), indicating that repeat sequences may be related to the divergence of mitogenome sizes in Rosaceae. The DNA repair hypothesis suggests that repeat sequences are formed by non-homologous end joining and break-induced replication (BIR) and further drive genome expansion at evolutionary time scales [33]. However, this phenomenon was not consistently observed in 88 seed plant mitogenomes, and several mitogenomes had a burst in repeat sequences, which indicated that other mechanisms may drive genome size variation such as gains or losses of entire chromosomes [34], abundant rearrangements, or loss of non-coding sequences [35].

Although repeats with lengths longer than 1000 bp showed a low correlation with genome size (Additional file 4: Fig. S2) in Rosaceae mitogenomes, they contained higher recombination frequencies than repeats with lengths shorter than 500 bp. Large mitochondrial repeats (> 1000 bp) undergo high-frequency reciprocal recombination to subdivide the genome in other plant species [36]. In addition, twenty repeats had recombination frequencies greater than 50%, indicating that a “master circle” was not the main conformation. High sub-genomic conformations have been observed in vivo, as exemplified in Silene [5], Cucumis [37], and Selaginellaceae [38], and no master conformation appeared in Saccharum officinarum [39]. Moreover, more than one repeat containing such recombination frequencies indicated that many conformations may appear at the same time (Additional file 6).

In this study, an over tenfold variation in rearrangement rate occurred between eleven genera of Rosaceae (Fig. 3b), and this variation also occurred within genera (Additional file 4: Fig. S5). In addition, at least 600-fold variation in rearrangement rate was identified in seed plants [8], and some studies found that environmental stress [40,41,42,43] and nuclear gene variation (like MSH1 and RECA) [44] might contribute to mitogenome rearrangement. In Malus, Mbac originates from Siberia, Msie is distributed in Central Asia, and Msyl is distributed in Western Europe [45]. Pyrus spreads from southwest China to Europe [21]. Fragaria is widespread in Asia, Europe, and North America [46, 47]. Prunus spreads from Asia to Europe [48, 49]. These different geographical distributions and environmental changes might be one reason for the variation in rearrangement rate among Rosaceae species.

Domestication may have been involved in the evolution and expansion of mitogenomes

Human selection has modified many crop traits, and cultivated crops are divergent from their wild progenitors [50]. DEL-D in selective sweep regions supports that selection drives mitogenome variation in pears. Formed by multi-step processes, DEL-D finally became fixed in Asian cultivated pears during domestication (Fig. 6j). Functional mitochondrial gene formation includes multiple steps and can cause phenotypic changes, biological diversity, and further benefits for natural adaptation [51]. DEL-D was formed by multi-recombination events, sequence imports, and new ORF formations (Fig. 6i), which may become a new resource conferring phenotypic or metabolic changes and contributing to adaptations to environmental stress. Afterwards, selection may quickly drive the allele frequency changes to improve the adaptive ability of the population [52]. DEL-D frequency is very different between Asian cultivated and wild pears and between Asian and European wild pears (Fig. 6d, e). DEL-M also had a significantly different frequency between Asian and European apples and between M. sieversii and cultivated apples (Fig. 6h). The selection sweeps and deletion frequency changes might aid in adaptation to environmental changes or be fit for human needs [53, 54].

Mitochondrial variants shed new insights on the maternal relationships between Pyrus species

The topology based on the newly assembled mitogenomes provides insights into the maternal phylogenetic relationships of Pyrus species, and it presents an alternative framework to that based on nuclear sequences [21]. Compared with nuclear-based phylogenetic analysis, Asian cultivated pears were divided into clade 1 and clade 3, and three main cultivated pear species (P. pyrifolia, P. bretschneideri, and P. ussuriensis) were mixed in clades 1 and 3, suggesting the mitogenome divergence process produced two main maternal lines in Asian cultivated pears. What is more, the divergence occurred in the maternal parents of M. domestica, and the selection of fruit size, flavor, or unilateral compatibility in crosses may be responsible for this divergence [55]. Five P. sinkiangensis cultivars were divided into Asian and European groups indicating that the maternal parents of P. sinkiangensis came from both Asian and European pears. Most Asian wild pear accessions were divergent from the cultivated species, and Pyrus calleryana (Pyw_ca), Pyrus xerophila (Pyw_xe), Pyrus phaeocarpa (Pyw_ph), and Pyrus serrulata (Pyw_se) showed a close relationship with cultivated pears indicating that introgression of maternal parents might happen because of cross-hybridization and adjacent distribution.


In this study, in-depth comparisons showed the evolutionary patterns of 38 mitogenomes in Rosaceae. Apparent gene losses and shrinkage of the mitogenome size occurred in the Amygdaloideae and Rosoideae subfamilies. Repeat content may lead to genome size variations and primarily drive the dynamics of genome structure by homologous recombination and genomic rearrangements. We estimated the absolute rearrangement rate of Rosaceae mitogenomes, and variations in rearrangement rates were also identified in Prunus, Malus, Pyrus, and Fragaria genera. Two divergent maternal lineages were identified in Asian cultivated pears, and free hybridization might explain the mixed maternal lines of cultivated P. pyrifolia, P. ussuriensis, and P. bretschneideri. Pyrus-specific sequence variation (DEL-D) was determined, based on the complete mitogenome and population data, to have originated from Amygdaloideae, and this sequence quickly expanded from Asian wild species to Asian cultivated species and European populations. This comparative genomic study provides new insights into the evolutionary and selection patterns of Rosaceae mitogenomes.

Method and plant materials

Assembly and annotation of the mitochondrial and chloroplast genomes

Thirty-four of the 38 mitogenomes were assembled using NGS and long-read sequencing data. The Illumina HiSeq 2000 data generated from the whole genome of “Dangshansuli” were used for mitogenome assembly, and the series of the “Dangshansuli” (P. bretschneideri Rehd.) published genome BAC libraries were selected (library insertion sizes of 180 bp, 488 bp, 500 bp, 800 bp, 2 kb, 5 kb, 10 kb, and 20 kb) [56]. Fastq files were first filtered using Trimmomatic [57] with default parameters, using 800 bp library insertion size data for mitogenome assembly. Reads were assembled using SOAPdenovo2 [23], and the scaffolds were polished using Pilon v1.23 [58]. Furthermore, to ensure that scaffolds were indeed mitogenomes, we chose the first ten longest assembled scaffolds to do the alignment in the NCBI assembly database, using a cutoff for the BLASTN e-value of 1e−5 for the scaffolds [59]. Lastly, scaffolds were selected from the mitogenomes. Insertion sizes of 2 kb, 5 kb, 10 kb, and 20 kb reads helped to concatenate the scaffolds into one, and then the mitogenome was polished by Pilon to fill the gaps. A similar method of organelle genome assembly based on whole genomes has been performed on plants [60].

The raw reads of 33 Rosaceae species were downloaded from NCBI (Additional file 12). We chose the “Dangshansuli,” Rosa chinensis (CM009589.1), and Prunus avium (MK816392.2) mitogenomes as the reference genomes. The mitochondrial long reads were identified by BLASR [61] to obtain candidate reads from the references and then assembled into contigs using the program Canu v1.8 [24]. Overlaps of mitochondrial candidate contigs were identified using the BLASTN program [59] and finally formed circular molecules. The circular molecules were polished by Pilon v1.23 [58]. Long and short reads were remapped to the polished genome sequences to check the completeness of all newly assembled mitogenomes (Additional file 2). The high-quality complete mitogenomes were annotated by Geseq [62] and Mitofy [63]. The final annotations were checked manually to correct the position of the start and stop codons. A strategy similar to the mitogenome assembly strategy was used for chloroplast genome assembly, and the genome sequences were annotated by Geseq [62]. The annotation files were further checked manually.

Identification of plastid-derived and repeat sequences

To identify plastid-derived sequences, the 38 mitogenomes were searched against the corresponding plastid genomes in the BLASTN program using an e-value cutoff of 1e−6 and a word size of 7, simultaneously. Repeats identified in the 38 mitogenomes were carried out using similar methods [64], and the BLASTN program was used to search each mitogenome against itself, using an e-value limit lower than 1e−6 and a word size of 7. The Caper/R package was used to perform the phylogenetic generalized least squares (PGLS) analysis to identify correlations between genome sizes and repeat sequences in the 38 Rosaceae mitogenomes. For the analysis of Fabaceae and seed plants (including 14 and 88 genera), only one accession per genus was chosen.

Identification of repeat-mediated homologous recombination events

To detect active, repeat-mediated, homologous recombination events within the long sequencing reads, we first built up mitochondrial read databases of 33 mitogenomes (the five other samples were excluded due to lack of long sequencing data). We used the 33 mitogenome assemblies as a reference to obtain candidate mitochondrial sequences from whole DNA long sequencing reads by BLASTN, using an e-value cutoff of 1e−100. Candidate mitochondrial reads were further searched against chloroplast genome sequences (Additional file 13) to remove putative plastid reads with overall alignment coverage of > 85% of the read length, and the clean reads were self-corrected using Canu v1.8 [24]. Finally, we obtained 33 mitochondrial read databases (Additional file 14) and used similar methods [13] to identify repeat-mediated homologous recombination events. Briefly, each repeat pair with 200 bp of up- and down-stream sequence was extracted as reference sequences and used to build two recombinant sequences (repeat pairs with 100% BLASTN identity) or six recombinant sequences (repeat pairs were lower than 100% BLASTN identity) (Additional file 4: Fig. S7). Then, the mitochondrial reads were blasted against the reference and recombinant sequences, and reads having identities above 99% and hit coverages of 200 bp in two flanking regions were selected.

Species tree construction and divergence time estimation

A total of 38 Rosaceae chloroplast genomes (Additional file 13) were used for phylogenetic analysis and divergence time estimation. The coding sequences of 76 chloroplast protein-encoding genes of the 38 Rosaceae samples (Additional file 15) and an outgroup, Vitis vinifera (NC_007957), were aligned. Phylogenetic trees were constructed using IQ-TREE [65]. Divergence time estimation was conducted by MCMCtree of PAML 4.9 [66] with the following parameters: burn-in of 5,000,000 iterations, sample frequency of 5000, and the MCMC process was performed 20,000 times. Three calibration points were used: one fossil of Prunus found in Shandong (> 44.3 Mya) [67], one fossil of Rubus (47.8 to 41.3 Mya) [68], and the estimated divergence time (130 to 123 Mya) between V. vinifera and Rosaceae [69].

Rearrangement event identification in Rosaceae mitogenomes

To infer the rearrangement rate between eleven genera, multiple alignments of all pairwise combinations of the mitogenomes of the eleven genera (Pmir, Pser, Gurb, Fves, Ejap, Pbet, Ruchi, Rorug, Pans, Spoh, and Msie) were conducted using Mauve v2.0 [70] to analyze locally collinear blocks (LCBs) in each mitogenome with default parameters, and pairwise rearrangement distances in terms of a minimum number of rearrangements were inferred using GRIMM with the circle chromosome option [71]. To explore the rearrangement rate of different branches of the tree, eleven samples were used in MLGO to infer the ancestral genome arrangement [72]. The rearrangement events between each node and neighboring nodes were calculated by GRIMM [71]. The rearrangement rate was calculated using the rearrangement events by dividing the absolute time of each branch. In addition, the number of pair-wise rearrangements was divided by double divergence time between the two samples to calculate the mean pair-wise rearrangement rate. Pyuc (for Pyrus), Fragaria viridis (for Fragaria), Msyl (for Malus), and Prunus armeniaca (for Prunus) were chosen as the reference genomes for their respective genera to adjust the direction of other mitogenomes for rearrangement analysis, and the rearrangement rate within the genera Pyrus, Malus, Prunus, and Fragaria were calculated using the same calculation methods used for inter-genera analysis.

SNP and INDEL calling of 139 pear accessions

Together, with the published re-sequencing data of 113 pears [21], we also selected another 26 pear accessions to perform next-generation sequencing using the same method on the HiSeq 2000 platform (Additional file 7). We used the “Dangshansuli” mitogenome as a reference for SNP and INDEL calling. Raw data of 139 pear accessions were trimmed by Trimmomatic v0.39 [57]. Clean data was mapped to the reference genome using Burrows-Wheeler Alignment v0.7.16 (BWA) [73]. SAMtools [74] was used to convert the sequence alignment mapping file (SAM) into a binary SAM (BAM) file. Then, the removal of duplicated reads was performed using the Picard software ( Variant identification and filtering were performed using GATK v4.1.4 [75]. Finally, all SNPs and INDELs with minor allele frequencies (MAF) of > 0.01 and max-missing rate of < 0.1 were extracted for subsequent analysis. SNPeff v4.3t [76] was used for SNP and INDEL annotation.

Phylogenetic tree construction, PCA, and population structure analysis

All SNPs for each sample were connected one by one as a single locus to make fasta files using an in-house python script, and then IQ-TREE [65] was used to generate the phylogenetic tree with the maximum likelihood method, and the best model was detected using the “MF” function. We set the ultrafast bootstrap replication number as 1000. To evaluate the relationships, PCA and population structure analysis were performed using plink v1.90b [77] and admixture v1.3 [78].

Diversity analysis and selection sweep identification

Pi (π) and FST were calculated by VCFtools v0.1.16 [79] with a 1000-bp sliding window and 500-bp steps in pear. To further identify the regions with signals of selection sweeps in cultivated pears, regions (1000-bp window) with signals for selective sweeps were identified with reference to previous criteria: the top FST > 0.1, πwild/πcul ratio > 2 based on common SNPs in the pear mitogenomes [80].

Frequency of deletion analysis

To further evaluate the frequency of the deletion (DEL-D) in 139 pear accessions, BEDTools v 2.18 [81] was used to calculate the mapping coverage of DEL-D in the 139 pear accessions. First, DEL-D was divided into three parts (Del1, Del2, and Del3), and the read depths of each part (Idep) were calculated respectively. Furthermore, the whole-genome depth of each accession (Wdep) was calculated. To avoid the differences in sequencing depth in the 139 accessions, we used a ratio of Idep divided by Wdep to evaluate the presence and absence of the deletion. Fortunately, the ratio of Del1 was divided into two levels, namely low (0.24–0.72) and high (6.94–142.98), with a high ratio representing Del1 being present in the mitogenome of this accession and a low ratio representing absence. This phenomenon also appeared in Del3. Due to Del2 sharing homology with chloroplast sequences, we excluded Del2 from further analyses. The frequency of Del1 and Del3 in different pear populations were calculated, and the two-tailed Student’s t-test was used to identify the significant differences. The same strategy was used to detect the frequency of the deletion (DEL-M: 6666 bp) in 116 apple accessions.

To detect the origin of the deletion sequence, we used a BLASTN search to detect the homologous sequence in 30 Rosaceae mitochondrial and nuclear genomes. The inferred putative origin of the intracellular transfer and nuclear-shared sequences were identified by performing BLASTN searches of mitogenomes against nuclear DNA, with an e-value cutoff lower than 1e−100 and hit length more than 100 bp, and the ggplot2 package ( was used for visualization. An in-house python script was used to calculate the total length of homologous sequences from each mitochondrial and nuclear genome. ORFs with a minimum length of 150 bp were identified in DEL-D using ORFfinder (

Availability of data and materials

Raw WGS data of pear and apple accessions were downloaded from the NCBI BioProject (PRJNA381668, PRJNA675194, PRJNA844501, and PRJNA322175). The NGS and Pacbio data used for mitogenome assembly were downloaded from NCBI, and the BioProject ID was supplied in Additional file 12. The 34 new assembly mitogenome sequences were all submitted to the NCBI database, and accession numbers are listed in Table 1.


  1. Van Aken O, Van Breusegem F. Licensed to kill: mitochondria, chloroplasts, and cell death. Trends Plant Sci. 2015;20(11):754–66.

    Article  PubMed  CAS  Google Scholar 

  2. Birky CW Jr. Uniparental inheritance of mitochondrial and chloroplast genes: mechanisms and evolution. Proc Natl Acad Sci U S A. 1995;92(25):11331–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Sloan DB. One ring to rule them all? Genome sequencing provides new insights into the ‘master circle’ model of plant mitochondrial DNA structure. New Phytol. 2013;200(4):978–85.

    Article  CAS  PubMed  Google Scholar 

  4. Skippington E, Barkman TJ, Rice DW, Palmer JD. Miniaturized mitogenome of the parasitic plant Viscum scurruloideum is extremely divergent and dynamic and has lost all nad genes. Proc Natl Acad Sci U S A. 2015;112(27):E3515–24.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Sloan DB, Alverson AJ, Chuckalovcak JP, Wu M, McCauley DE, Palmer JD, et al. Rapid evolution of enormous, multichromosomal genomes in flowering plant mitochondria with exceptionally high mutation rates. PLoS Biol. 2012;10(1):e1001241.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Richardson AO, Rice DW, Young GJ, Alverson AJ, Palmer JD. The “fossilized” mitochondrial genome of Liriodendron tulipifera: ancestral gene content and order, ancestral editing sites, and extraordinarily low mutation rate. BMC Biol. 2013;11:29.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Kozik A, Rowan BA, Lavelle D, Berke L, Schranz ME, Michelmore RW, et al. The alternative reality of plant mitochondrial DNA: one ring does not rule them all. PLoS Genet. 2019;15(8):e1008373.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Cole LW, Guo W, Mower JP, Palmer JD. High and variable rates of repeat-mediated mitochondrial genome rearrangement in a genus of plants. Mol Biol Evol. 2018;35(11):2773–85.

    CAS  PubMed  Google Scholar 

  9. Choi IS, Schwarz EN, Ruhlman TA, Khiyami MA, Sabir JSM, Hajarah NH, et al. Fluctuations in Fabaceae mitochondrial genome size and content are both ancient and recent. BMC Plant Biol. 2019;19(1):448.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  10. Mower JP, Sloan DB, Alverson AJ. Plant mitochondrial genome diversity: the genomics revolution. In: Wendel JF, Greilhuber J, Dolezel J, Leitch IJ, editors. Plant genome diversity volume 1: plant genomes, their residents, and their evolutionary dynamics. Vienna: Springer Vienna; 2012. p. 123–44.

    Chapter  Google Scholar 

  11. Davila JI, Arrieta-Montiel MP, Wamboldt Y, Cao J, Hagmann J, Shedge V, et al. Double-strand break repair processes drive evolution of the mitochondrial genome in Arabidopsis. BMC Biol. 2011;9:64.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Sullivan AR, Eldfjell Y, Schiffthaler B, Delhomme N, Asp T, Hebelstrup KH, et al. The mitogenome of Norway spruce and a reappraisal of mitochondrial recombination in plants. Genome Biol Evol. 2020;12(1):3586–98.

    Article  CAS  PubMed  Google Scholar 

  13. Dong S, Zhao C, Chen F, Liu Y, Zhang S, Wu H, et al. The complete mitochondrial genome of the early flowering plant Nymphaea colorata is highly repetitive with low recombination. BMC Genomics. 2018;19(1):614.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Chevigny N, Schatz-Daas D, Lotfi F, Gualberto JM. DNA repair and the stability of the plant mitochondrial genome. Int J Mol Sci. 2020;21(1):328.

    Article  CAS  PubMed Central  Google Scholar 

  15. Guo WH, Grewe F, Fan WS, Young GJ, Knoop V, Palmer JD, et al. Ginkgo and Welwitschia mitogenomes reveal extreme contrasts in gymnosperm mitochondrial evolution. Mol Biol Evol. 2016;33(6):1448–60.

    Article  CAS  PubMed  Google Scholar 

  16. Juszczuk IM, Flexas J, Szal B, Dabrowska Z, Ribas-Carbo M, Rychter AM. Effect of mitochondrial genome rearrangement on respiratory activity, photosynthesis, photorespiration and energy status of MSC16 cucumber (Cucumis sativus) mutant. Physiol Plant. 2007;131(4):527–41.

    Article  CAS  PubMed  Google Scholar 

  17. Bentolila S, Stefanov S. A reevaluation of rice mitochondrial evolution based on the complete sequence of male-fertile and male-sterile mitochondrial genomes. Plant Physiol. 2012;158(2):996–1017.

    Article  CAS  PubMed  Google Scholar 

  18. Shi S, Li J, Sun J, Yu J, Zhou S. Phylogeny and classification of Prunus sensu lato (Rosaceae). J Integr Plant Biol. 2013;55(11):1069–79.

    Article  CAS  PubMed  Google Scholar 

  19. Rono PC, Dong X, Yang JX, Mutie FM, Oulo MA, Malombe I, et al. Initial complete chloroplast genomes of Alchemilla (Rosaceae): comparative analysis and phylogenetic relationships. Front Genet. 2020;11:560368.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Sun X, Jiao C, Schwaninger H, Chao CT, Ma Y, Duan N, et al. Phased diploid genome assemblies and pan-genomes provide insights into the genetic history of apple domestication. Nat Genet. 2020;52(12):1423–32.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Wu J, Wang Y, Xu J, Korban SS, Fei Z, Tao S, et al. Diversification and independent domestication of Asian and European pears. Genome Biol. 2018;19(1):77.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Jin JJ, Yu WB, Yang JB, Song Y, dePamphilis CW, Yi TS, et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 2020;21(1):241.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Luo RB, Liu BH, Xie YL, Li ZY, Huang WH, Yuan JY, et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012;1:18.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27(5):722–36.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Yue X, Zheng X, Zong Y, Jiang S, Hu C, Yu P, et al. Combined analyses of chloroplast DNA haplotypes and microsatellite markers reveal new insights into the origin and dissemination route of cultivated pears native to East Asia. Front Plant Sci. 2018;9:591.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Barr CM, Neiman M, Taylor DR. Inheritance and recombination of mitochondrial genomes in plants, fungi and animals. New Phytol. 2005;168(1):39–50.

    Article  CAS  PubMed  Google Scholar 

  27. Adams KL, Qiu YL, Stoutemyer M, Palmer JD. Punctuated evolution of mitochondrial gene content: high and variable rates of mitochondrial gene loss and transfer to the nucleus during angiosperm evolution. Proc Natl Acad Sci U S A. 2002;99(15):9905–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Hall ND, Zhang H, Mower JP, McElroy JS, Goertzen LR. The mitochondrial genome of Eleusine indica and characterization of gene content within Poaceae. Genome Biol Evol. 2020;12(1):3684–97.

    CAS  PubMed  Google Scholar 

  29. Kwasniak-Owczarek M, Kazmierczak U, Tomal A, Mackiewicz P, Janska H. Deficiency of mitoribosomal S10 protein affects translation and splicing in Arabidopsis mitochondria. Nucleic Acids Res. 2019;47(22):11790–806.

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Hunt MD, Newton KJ. The NCS3 mutation: genetic evidence for the expression of ribosomal protein genes in Zea mays mitochondria. EMBO J. 1991;10(5):1045–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Sakamoto W, Kondo H, Murata M, Motoyoshi F. Altered mitochondrial gene expression in a maternal distorted leaf mutant of Arabidopsis induced by chloroplast mutator. Plant Cell. 1996;8(8):1377–90.

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Zhang X, Takano T, Liu S. Identification of a mitochondrial ATP synthase small subunit gene (RMtATP6) expressed in response to salts and osmotic stresses in rice (Oryza sativa L.). J Exp Bot. 2006;57(1):193–200.

    Article  CAS  PubMed  Google Scholar 

  33. Christensen AC. Plant mitochondrial genome evolution can be explained by DNA repair mechanisms. Genome Biol Evol. 2013;5(6):1079–86.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Wu ZQ, Cuthbert JM, Taylor DR, Sloan DB. The massive mitochondrial genome of the angiosperm Silene noctiflora is evolving by gain or loss of entire chromosomes. Proc Natl Acad Sci U S A. 2015;112(33):10185–91.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Wang S, Li D, Yao X, Song Q, Wang Z, Zhang Q, et al. Evolution and diversification of kiwifruit mitogenomes through extensive whole-genome rearrangement and mosaic loss of intergenic sequences in a highly variable region. Genome Biol Evol. 2019;11(4):1192–206.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Sugiyama Y, Watase Y, Nagase M, Makita N, Yagura S, Hirai A, et al. The complete nucleotide sequence and multipartite organization of the tobacco mitochondrial genome: comparative analysis of mitochondrial genomes in higher plants. Mol Gen Genomics. 2005;272(6):603–15.

    Article  CAS  Google Scholar 

  37. Alverson AJ, Rice DW, Dickinson S, Barry K, Palmer JD. Origins and recombination of the bacterial-sized multichromosomal mitochondrial genome of cucumber. Plant Cell. 2011;23(7):2499–513.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Kang JS, Zhang HR, Wang YR, Liang SQ, Mao ZY, Zhang XC, et al. Distinctive evolutionary pattern of organelle genomes linked to the nuclear genome in Selaginellaceae. Plant J. 2020;104(6):1657–72.

    Article  CAS  PubMed  Google Scholar 

  39. Shearman JR, Sonthirod C, Naktang C, Pootakham W, Yoocha T, Sangsrakru D, et al. The two chromosomes of the mitochondrial genome of a sugarcane cultivar: assembly and recombination analysis using long PacBio reads. Sci Rep. 2016;6:31533.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Shedge V, Davila J, Arrieta-Montiel MP, Mohammed S, Mackenzie SA. Extensive rearrangement of the Arabidopsis mitochondrial genome elicits cellular conditions for thermotolerance. Plant Physiol. 2010;152(4):1960–70.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Xu YZ, Arrieta-Montiel MP, Virdi KS, de Paula WBM, Widhalm JR, Basset GJ, et al. MutS HOMOLOG1 is a nucleoid protein that alters mitochondrial and plastid properties and plant response to high light. Plant Cell. 2011;23(9):3428–41.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Cheng L, Wang W, Yao Y, Sun Q. Mitochondrial RNase H1 activity regulates R-loop homeostasis to maintain genome integrity and enable early embryogenesis in Arabidopsis. PLoS Biol. 2021;19(8):e3001357.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Virdi KS, Wamboldt Y, Kundariya H, Laurie JD, Keren I, Kumar KRS, et al. MSH1 is a plant organellar DNA binding and thylakoid protein under precise spatial regulation to alter development. Mol Plant. 2016;9(2):245–60.

    Article  CAS  PubMed  Google Scholar 

  44. Gualberto JM, Newton KJ. Plant mitochondrial genomes: dynamics and mechanisms of mutation. Annu Rev Plant Biol. 2017;68:225–52.

    Article  CAS  PubMed  Google Scholar 

  45. Chen XL, Li SM, Zhang D, Han MY, Jin X, Zhao CP, et al. Sequencing of a wild apple (Malus baccata) genome unravels the differences between cultivated and wild apple species regarding disease resistance and cold tolerance. G3 (Bethesda). 2019;9(7):2051–60.

    Article  CAS  Google Scholar 

  46. Edger PP, Poorten TJ, VanBuren R, Hardigan MA, Colle M, McKain MR, et al. Origin and evolution of the octoploid strawberry genome. Nat Genet. 2019;51(3):541–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Qiao Q, Edger PP, Xue L, Qiong LJ, Zhang Y, Cao Q, et al. Evolutionary history and pan-genome dynamics of strawberry (Fragaria spp.). Proc Natl Acad Sci U S A. 2021;118(45):e2105431118.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Groppi A, Liu S, Cornille A, Decroocq S, Bui QT, Tricon D, et al. Population genomics of apricots unravels domestication history and adaptive events. Nat Commun. 2021;12(1):3956.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Li Y, Cao K, Li N, Zhu GR, Fang WC, Chen CW, et al. Genomic analyses provide insights into peach local adaptation and responses to climate change. Genome Res. 2021;31(4):592–606.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Gross BL, Olsen KM. Genetic perspectives on crop domestication. Trends Plant Sci. 2010;15(9):529–37.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Tang H, Zheng X, Li C, Xie X, Chen Y, Chen L, et al. Multi-step formation, evolution, and functionalization of new cytoplasmic male sterility genes in the plant mitochondrial genomes. Cell Res. 2017;27(1):130–46.

    Article  CAS  PubMed  Google Scholar 

  52. Lajbner Z, Pnini R, Camus MF, Miller J, Dowling DK. Experimental evidence that thermal selection shapes mitochondrial genome evolution. Sci Rep. 2018;8(1):9500.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  53. Li X, Liu L, Ming M, Hu H, Zhang M, Fan J, et al. Comparative transcriptomic analysis provides insight into the domestication and improvement of pear (P. pyrifolia) fruit. Plant Physiol. 2019;180(1):435–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Duan NB, Bai Y, Sun HH, Wang N, Ma YM, Li MJ, et al. Genome re-sequencing reveals the history of apple and supports a two-stage model for fruit enlargement. Nat Commun. 2017;8:249.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  55. Nikiforova SV, Cavalieri D, Velasco R, Goremykin V. Phylogenetic analysis of 47 chloroplast genomes clarifies the contribution of wild species to the domesticated apple maternal line. Mol Biol Evol. 2013;30(8):1751–60.

    Article  CAS  PubMed  Google Scholar 

  56. Wu J, Wang Z, Shi Z, Zhang S, Ming R, Zhu S, et al. The genome of the pear (Pyrus bretschneideri Rehd.). Genome Res. 2013;23(2):396–408.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9(11):e112963.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  59. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinf. 2009;10:421.

    Article  CAS  Google Scholar 

  60. Zhang T, Zhang X, Hu S, Yu J. An efficient procedure for plant organellar genome assembly, based on whole genome data from the 454 GS FLX sequencing platform. Plant Methods. 2011;7:38.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Chaisson MJ, Tesler G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinf. 2012;13:238.

    Article  CAS  Google Scholar 

  62. Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R, et al. GeSeq - versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 2017;45(W1):W6–W11.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Alverson AJ, Wei XX, Rice DW, Stern DB, Barry K, Palmer JD. Insights into the evolution of mitochondrial genome size from complete sequences of Citrullus lanatus and Cucurbita pepo (Cucurbitaceae). Mol Biol Evol. 2010;27(6):1436–48.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Guo W, Zhu A, Fan W, Mower JP. Complete mitochondrial genomes from the ferns Ophioglossum californicum and Psilotum nudum are highly repetitive with the largest organellar introns. New Phytol. 2017;213(1):391–403.

    Article  CAS  PubMed  Google Scholar 

  65. Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268–74.

    Article  CAS  PubMed  Google Scholar 

  66. Yang ZH. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586–91.

    Article  CAS  PubMed  Google Scholar 

  67. Li Y, Smith T, Liu CJ, Awasthi N, Yang J, Wang YF, et al. Endocarps of Prunus (Rosaceae: Prunoideae) from the early Eocene of Wutu, Shandong Province, China. Taxon. 2011;60(2):555–64.

    Article  Google Scholar 

  68. Gray J. The lower tertiary floras of Southern England. Science. 1964;144(3619):719–20.

    Article  CAS  PubMed  Google Scholar 

  69. Hohmann N, Wolf EM, Lysak MA, Koch MA. A time-calibrated road map of Brassicaceae species radiation and evolutionary history. Plant Cell. 2015;27(10):2770–84.

    CAS  PubMed  PubMed Central  Google Scholar 

  70. Darling AE, Mau B, Perna NT. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One. 2010;5(6):e11147.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  71. Tesler G. GRIMM: genome rearrangements web server. Bioinformatics. 2002;18(3):492–3.

    Article  CAS  PubMed  Google Scholar 

  72. Hu F, Lin Y, Tang J. MLGO: phylogeny reconstruction and ancestral inference from gene-order data. BMC Bioinformatics. 2014;15:354.

    Article  PubMed  PubMed Central  Google Scholar 

  73. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. Proc GPD: the Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  75. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012;6(2):80–92.

    Article  CAS  Google Scholar 

  77. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19(9):1655–64.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Li MR, Shi FX, Li YL, Jiang P, Jiao L, Liu B, et al. Genome-wide variation patterns uncover the origin and selection in cultivated ginseng (Panax ginseng Meyer). Genome Biol Evol. 2017;9(9):2159–69.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank the high-performance computing platforms of the Bioinformatics Center of Nanjing Agricultural University and the State Key Laboratory of Crop Biology of Shandong Agricultural University for supporting this project. This project was also supported by the platform of the Center of Pear Engineering Technology Research of Nanjing Agricultural University. Honghe Sun and Leiting Li gave constructive suggestions for this project. The authors would like to thank everyone who contributed to this article. All authors read and approved the final manuscript.


This work was supported by the National Key Research and Development Program (2018YFD1000200), the National Science Foundation of China (31820103012, 31901978), the Earmarked fund for China Agriculture Research System (CARS-28), the Earmarked Fund for Jiangsu Agricultural Industry Technology System (JATS [2021]453), and Natural Science Foundation of Jiangsu Province for Young Scholar (BK20221010).

Author information

Authors and Affiliations



MYS and MYZ drafted the manuscript. MYS, MYZ, and XNC performed the bioinformatics analysis. YYL, BBL, JML, RZW, and KJZ reviewed the manuscript. JW conceived this study and prepared the manuscript. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Jun Wu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Released Rosaceae mitogenomes (Last access date: 20-Jan-2022).

Additional file 2.

Mapping depth of 34 newly assembled Rosaceae mitogenomes.

Additional file 3.

Repeat statistics of 38 Rosaceae mitogenomes.

Additional file 4: Figure S1.

The relationship between genome size and total repeat length and number in 38 Rosaceae mitogenomes. The repeats were divided into four types: length <100 bp (a, e); 100 bp ≤ repeat length ≤ 500 bp (b, f); 500 bp < repeat length ≤ 1,000bp (c, g); and repeat length >1,000 bp (d, h). The linear regression equation is displayed with adjusted R-square and P-values. Figure S2. The relationship between mitogenome size and total repeat length and count in 14 Fabaceae mitogenomes. The repeats were divided into six types: all repeats (a, g); repeat length <100 bp (b, h); 100 bp ≤ repeat length ≤ 500 bp (c, i); 500 < repeat length ≤ 1,000 bp (d, j); length >1,000 bp (e, k); and repeat length ≤500 bp (f, l). The linear regression equation is displayed with adjusted R-square and P-value. Figure S3. The relationship between mitogenome size and total repeat length and count in 88 seed plants. The repeats were divided into six types: all repeats (a, g); repeat length <100 bp (b, h); 100 bp ≤ repeat length ≤ 500 bp (c, i); 500 < repeat length ≤ 1,000 bp (d, j); repeat length >1,000 bp (e, k); and repeat length ≤500 bp (f, l). The linear regression equation is displayed with adjusted R-square and P-value. Figure S4. The distribution of repeat count (a) and total repeat length (b) of 50 seed plant mitogenomes with genome sizes ranging from 271.60 to 525.67 kb. Figure S5. The rearrangement rate estimated using tree-based methods in Malus (a), Pyrus (b), Prunus (c) and Fragaria (d). Red numbers on the branches represent rearrangement events and rates (rearrangement events per million years), respectively. Yellow triangles represent the varieties within specie, and the rearrangement events and rates are calculated between species and neighboring nodes. The blue bar indicates the 95% highest posterior densities. Figure S6. The mapping depth and distribution analysis of 116 apple accessions. (a) The mapping depth of 116 apple accessions. The NGS reads are mapped to the Malus domestica cv. ‘Gala’ (Mdom-G) mitogenome. A ratio of Idep divided by Wdep was used to evaluate the mapping results, and the ratio was further normalized using the z-score method. Orange: high mt read mapping depth, blue: low mt read mapping depth. AW: Asian wild apples; EW: European wild apples; Sie: Malus sieversii; Dom: Malus domestica. (b) Distribution analysis of apple mitogenomes. Main distribution areas are marked by circles. Blue: apples containing the deletion (Del), red: apple not containing this deletion. Triangles represent wild apple and circles represents cultivated apple. Dom: Malus domestica; Syl: Malus sylvestris; Sie_K: Malus sieversii in west of TianShan; Sie_X: Malus sieversii in east of TianShan; Bac: Malus baccata; Asi: Malus Asiatica; Hup: Malus hupehensis. Figure S7. Flow chart for repeat recombination analysis. (a) Recombinant sequence construction. ‘b’ and ‘e’ indicate repeat sequences; ‘a’ and ‘d’ indicate the upstream 200 bp sequences; ‘c’ and ‘f’ indicate the downstream 200 bp sequences. (b) Mitochondrial reads mapping to reference and recombinant sequences using BLASTN. (c) Recombination frequency calculation.

Additional file 5.

Repeat statistics of 88 seed plant mitogenomes.

Additional file 6.

Information on 341 repeats containing recombination activities.

Additional file 7.

139 wild and cultivated pear accessions and mapping profile.

Additional file 8.

SNPs annotation of mitogenome.

Additional file 9.

INDELs annotation of mitogenome.

Additional file 10.

Mitochondrial genes in selective sweep regions.

Additional file 11.

Homologous sequence of DEL-D (Pbre-D: 183,739-199,800 bp) in 30 mitogenomes and nuclear genomes.

Additional file 12.

Project information for the raw sequences of 34 Rosaceae samples.

Additional file 13.

Summary of 38 Rosaceae chloroplast genomes.

Additional file 14.

Information for the mitochondrial read databases.

Additional file 15.

Genes used in the chloroplast genome phylogeny analysis.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sun, M., Zhang, M., Chen, X. et al. Rearrangement and domestication as drivers of Rosaceae mitogenome plasticity. BMC Biol 20, 181 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Mitogenome
  • Rosaceae
  • Rearrangement rate
  • Domestication