Assembly of the T genome
The chromosomes of karat were fluorescently stained, and the result showed that karat is an autotriploid cultivar with 30 chromosomes (Additional file 1: Fig. S1). According to the genome survey, the T genome is 606~655 Mb in size and has a heterozygosity rate ranging from 1.25% (tongkat, TT) to 1.55% (karat, TTT) (Additional file 1: Fig. S2 and Additional file 1: Table S1-2). For genome sequencing, we generated 42 Gb of Nanopore reads, 6.9 Gb of PacBio reads and 42 Gb of Illumina reads (Additional file 1: Table S3-5). Using NextDenovo and NextPolish, we obtained an assembly with a total length of 918 Mb and contig N50 of 4.9 Mb (Additional file 1: Table S6). After purging haplotigs (Additional file 1: Fig. S3), we obtained 603 Mb contigs (Additional file 1: Table S7), and with 110 Gb of Hi-C reads mapped, the contigs were arranged into 10 chromosomes (Fig. 1a and Additional file 1: Fig. S4). BUSCO analysis showed that 97.7% of the BUSCO genes were assembled (Additional file 1: Table S8). The transcriptomes of leaves, roots, stems and fruits were sequenced for gene annotation. Using Maker2 [23], we predicted 37,577 protein-coding genes (Additional file 1: Table S9). BUSCO analysis showed that 92.5% of the BUSCO genes were predicted (Additional file 1: Table S10). Using eggNOG-mapper [24], we predicted 30,377 protein-coding genes with orthologues, 16,687 genes with GO annotation and 13,105 genes with KEGG annotation. Using RepeatMasker [25], we found that 59.62% of the T genome contained repeat elements (Additional file 1: Table S11). LTR/Gypsy and LTR/Copia accounted for 15.1% and 36.4% of the genome, respectively. As specific marker of the centromeric regions in M. acuminata genome [16, 26], Nanica LINE clusters also presented in all chromosomes (Additional file 1: Fig. S5). Using LTR FINDER [27], we identified 3,128 intact LTRs, and analysis of the insertion time showed that there was an LTR insertion burst at 1.47 MYA (Additional file 1: Fig. S6), which occurred before the burst of M. balbisiana (0.32 MYA) and after the burst of M. acuminata (1.77 MYA). Then, Illumina reads of karat and tongkat were mapped to the T genome. There were 516,884 and 459,137 indels, and 7,716,375 and 7,125,857 single-nucleotide polymorphism (SNP) sites identified in karat and tongkat, respectively (Additional file 1: Table S12 and Additional file 1: Fig. S7).
Using OrthoFinder [28], we identified 8924 single-copy genes and 27,100 orthologous gene sets in the A, B, S, T and M. itinerans genomes. There were 7791 genes specific in the T genome. According to the phylogenetic tree generated by OrthoFinder and the divergence time of M. acuminata and M. balbisiana reported in a previous study [17], we constructed an ultrametric tree showing that M. troglodytarum diverged from the ancestor of M. acuminata, M. schizocarpa and M. balbisiana 20.8 MYA (Fig. 1d). According to the 4DTV distances, the peak Ks values were approximately 0.06, 0.06, and 0.07 for M. troglodytarum–M. acuminata, M. troglodytarum–M. schizocarpa and M. troglodytarum–M. balbisiana, respectively, where a peak Ks value of approximately 0.46 indicated whole-genome duplication (Fig. 1e). Using Café [29], we identified 771, 460, 1820 and 608 expanded gene families and 3325, 3049, 562 and 6980 contracted gene families in the A, S, T and B genomes, respectively. There are 11 gene families with rapid expansions in T genome. GO functional enrichment analysis was conducted to explore the functions T genome-specific genes and rapidly evolving families (Additional file 1: Fig. S8-9 and supplementary Additional file 2: Data 1-2). GO enrichment analysis show that those genes of rapidly evolving gene families were enriched in cell morphogenesis, cell growth, defense response to insect immune system process, defense response to bacterium, defense response to fungus, response to virus etc.
Using MGRA2 [30], we constructed the ancestral genome of the A, B, S and T genomes, which resulted in 11 contiguous ancestral regions (CARs) and 20,056 ordered ancestral genes. For chromosome rearrangement, we constructed bar plots of the A, B, S and T genomes compared to CARs using MCSCAN (Fig. 1c). The ancestor of the A, B, S and T genomes experienced multiple chromosome rearrangements before and after their divergence. Chromosome 1 of the A, B and T genomes experienced translocation after divergence, and chromosomes8 and 9 in the ancestry fused into chromosome 9 of the T genome. A dot plot of the synteny gene blocks between M. troglodytarum and M. acuminata also indicated the fusion of chromosomes 8 and 9 in the T genome (Additional file 1: Fig. S10-11). M. troglodytarum was domesticated independently and diverged from an ancestor of M. acuminata, M. schizocarpa and M. balbisiana 20.8 MYA. Thus, M. troglodytarum has experienced multiple translocations and inversions, unlike the high synteny with few rearrangements found among M. schizocarpa, M. acuminata and M. balbisiana.
Transcriptome and metabolome of the fruit
To determine the basis of the enrichment of carotenoids and flavonoids and the non-climacteric behaviour of karat, we integrated widely targeted metabolomics and targeted metabolomics data from karat pulp at 25, 45, 65, 115, 145, 173, 200 and 215 days after flowering (DAF) and RNA sequencing (RNA-seq) data from karat pulp at 25, 45, 65, 115, 100, 130, 145, 152, 159, 173, 200 and 208 DAF (Fig. 2a). According to widely targeted metabolomic data, we identified 877 metabolites, including flavonoids, lipids, phenolic acids, amino acids and their derivatives, organic acids, nucleotides and their derivatives, alkaloids, lignin, coumarins, tannins, terpenoids, quinones and others, 768 of them were divided into 5 clusters (Fig. 2f, Additional file 1: Fig. S12 and Additional file 2: Data 3-4). Cluster 2 represents the metabolites that increased during ripening, including alkaloids, amino acids and their derivatives, coumarins, free fatty acids, organic acids, phenolic acids, saccharides and alcohols, vitamins and others.
Carotenoids were enriched throughout fruit development
According to the quantification of the karat pulp metabolites, lutein accumulated throughout all the fruit developmental stages, with a content of 10.62–32.25 μg/g (Fig. 2b, c). The contents of α-carotene, β-carotene and phytoene increased rapidly at 200 DAF and 215 DAF, with values of 53.42, 27.90, and 10.57 μg/g, respectively, at 215 DAF. In addition, β-cryptoxanthin-laurate, rubixanthin-laurate β-cryptoxanthin, γ-carotene and others, were also increased at 215 DAF (Additional file 2: Data 5). Microsynteny analysis of carotenoid biosynthesis pathway genes showed that MtSSUIIs were triplicated and MtCCD4s were duplicated in the T genome (Figs. 1c and 3b). According to the RNA-seq data, the key genes of carotenoid biosynthesis, including MtGGPPS1, MtSSUIIs, MtPSY2s, MtLCYBs, MtLCYEs, MtZDSs, Mtβ-OH and Mtε-OH, were all highly expressed across all the fruit developmental stages (Fig. 3a, c). At 200 DAF and 215 DAF, hyperaccumulation of α-carotene, β-carotene and phytoene coincided with a decrease in CCD4 expression.
The triplication of MtSSUII may explain the enrichment of carotenoids in karat and other Fe’i banana fruits. The lutein contents of these fruits were high throughout the fruit development process. SSUII enhances the accumulation of carotenoids by interacting with GGPPS1 and PSY, promoting their enzymatic activity [31, 32]. In addition to MtSSUIIs enhancing the hyperaccumulation of carotenoids, MtCCD4, a key gene that regulates various branches of carotenoid biosynthesis, regulates the accumulation of α-carotene and β-carotene during ripening and is downregulated at the end of ripening [11]. Downregulation of CCD4 is fruit-specific and may be the key reason for the enrichment of only α-carotene and β-carotene in the fruit. According to coexpression network analysis, MtCCD4 was coregulated with MtETO1 and MtJAZ1. MtJAZ1 is the key regulator of the JA signalling pathway and is induced by JA [33]. Multiple JA response element G-box and TGACG-box motifs were identified in the promoters of MtCCD4a and MtCCD4b (Additional file 1: Fig. S13), similar to CCD4 in Brassica napus, indicating an extensive role of JA in the regulation of CCD4 [34]. In Osmanthus fragrans, OfCCD4 were also induced by JA treatment [35]. In the full-green (FG) stage, the decreased expression level of CCD4 coincides with the increase in the JA content, but in the full-ripening (FR) stage, the decrease in the JA content also coincides with the downregulation of MtCCD4s, implying the complex regulation of MtCCD4s by JA. For α-carotene and β-carotene rapidly accumulation in FR stage, JA may repress the accumulation of α-carotene and β-carotene by activating the expression of MtCCD4s in fruit. Therefore, further research is needed to elucidate the mechanism governing the regulation of MtCCD4s by JA, which may be spatiotemporally dependent and dose dependent.
Flavonoids are enriched early during fruit development, which may be due to the expansion of MtF3′5′Hs
The T genome has 17 F3′5 ′H loci, while there are eight, eight and five loci in the A, B and S genomes respectively. Microsynteny analysis showed that the flavonoid biosynthesis gene MtF3′5 ′H was tandemly duplicated on both chromosomes 2 and 10, resulting in nine more loci than were present in the A genome (Fig. 4b). In particular, F3′5′H on chromosome 9, which is a single locus in the A, B and S genomes, is duplicated into eight loci in the T genome. Moreover, seven of the eight loci distributed on chromosome 9 of the T genome showed highly similar expression patterns in karat. MtF3′H, which competes with MtF3′5′Hs for substrates, was largely decreased in karat. Microsynteny analysis also showed that there are three M. troglodytarum-specific regions in the upstream sequences of MtF3′H. No similar sequences were identified by BLAST in A, B, S or other genomes. The specific regions may contribute to the low expression level of MtF3’H in karat. According to the quantification of flavonoids in pink stem sap, the delphinidin-3-rutinoside chloride content was enriched. Moreover, multiple flavonoids in the pulp were found to be enriched, including 4′-hydroxy-5,7-dimethoxyflavanone, epicatechin, myricetin-3-O-rutinoside, and delphinidin-3-O-rutinoside. In particular, only 4′-hydroxy-5,7-dimethoxyflavanone was enriched at 215 DAF, while epicatechin, myricetin-3-O-rutinoside and delphinidin-3-O-rutinoside degraded largely at the end of the ripening process, coinciding with the fading of pink sap in the fruit (Fig. 2a, c–e). The duplication of MtF3′5′H and suppression of MtF3′H led to the enrichment of delphinidin-3-O-rutinoside, which differs from other types of bananas. The heatmap shows that the key genes involved in the synthesis of flavonoids were downregulated at the end of the ripening process, except for MtUFGTs, which were highly expressed throughout the ripening process (Fig. 4a, c).
Riboflavin is enriched in karat pulp
According to the widely targeted metabolome analysis, riboflavin (B2), pantothenic acid (B5) and pyridoxine (B6) were enriched in karat pulp (Fig. 2c). In particular, riboflavin (B2) was enriched, especially in Fe’i banana fruit. Transcriptome analysis also showed that the riboflavin de novo synthesis genes MtRIBA1 and MtFMNse showed higher expression levels in karat fruit pulp than in BXJ (BaXi Jiao, Musa acuminata L. AAA group cv. Cavendish) fruit pulp. Moreover, microsynteny analysis showed that there was a 2399-bp deletion in the 5′ UTR of MtRIBA1 (Fig. 5a). As a consequence, MtRIBA1 was highly expressed throughout all developmental stages, and the increased expression of MtFMNSE across the ripening process, may be the reason for the enrichment of riboflavin.
Non-climacteric behaviour of karat fruit
Fruit ripening is distinctively different between climacteric and non-climacteric fruit. ABA and ethylene play key roles in the ripening of climacteric fruit, while non-climacteric ripening is linked to only ABA [36]. Banana is usually a climacteric fruit, while karat shows non-climacteric behaviour with a long bunch-filling time (215 days) and shelf life (harvested full yellow fruits can be stored for approximately 8 days under ambient conditions). By integrating metabolomic and comparative transcriptomic analyses, we found that the non-climacteric behaviour of karat is due to the transformation of ethylene-induced ripening into ABA-induced ripening.
Using a widely targeted metabolome, we quantified the plant hormones involved in the regulation of fruit development, including ABA, 1-aminocyclopropane-1-carboxylic acid (ACC), jasmonic acid (JA), indoleacetic acid (IAA), and salicylic acid (SA). ACC increased at 65 DAF and decreased at 145 DAF; ABA increased at 65 DAF and decreased at 215 DAF; JA increased at 65 DAF and decreased at 200 DAF; IAA decreased at 173 DAF; and SA decreased at 145 DAF but increased at 215 DAF (Fig. 5b). Karat fruit ripened at 215 DAF and was harvested at that time in June. At 65 DAF, homologues of MaNAC1, MaBMY, MaAMY3, MaBMY3 and MaBGAL6, were upregulated with the onset of fruit ripening, and the ABA and JA contents increased as the fruits reached the mature-green or full-green (FG) stages. ABA may play key roles in both FG and full-ripening (FR) stage. While ethylene, IAA, JA and SA may involve in fruit ripening during the FG stage. Moreover, there was another ABA peak at 200 DAF, when the fruit reached the FR stage, coinciding with fruit softening and the rapid accumulation of α-carotene and β-carotene (Fig. 2a and Additional file 2: Data 6-8). Though ABA, ethylene, JA, IAA and SA were all reported to activate the biosynthesis of carotenoids in many plants [13, 35, 37, 38], ABA may be one of the key factors regulating the accumulation of α-carotene and β-carotene in FR stage.
Autocatalytic ethylene synthesis was disrupted in karat, but ABA synthesis was enhanced. MaERF11 is a key ethylene-related gene that negatively regulates ethylene biosynthesis by suppressing MaACS1 and MaACO1 in banana [39]. MaACS1 and MaACO1 are the key genes that regulate ethylene biosynthesis. Homologues of MaERF11, MaACS1 and MaACO1 presented expression patterns that differed from those in FJ (Fen Jiao, Musa ABB PisangAwak) and BXJ (Fig. 5c). Both MaERF11 and MtERF11 are repressor with ERA repression motif ‘DLNNPP’. Microsynteny analysis showed that there was a DNA fragment of 4,708 bp and an LTR-like fragment of 2116 bp inserted 780 bp upstream of MtERF11 and that there was a DNA fragment of 39,766 bp and an LTR-like fragment of 14,924 bp inserted 2720 bp upstream of MtACO1 (Fig. 5a). Due to sequence variations, unlike that of MaERF11 in BXJ, the expression of MtERF11 was not suppressed during ripening as that of MaERF11 in BXJ. The expression of MtERF11 showed a similar expression pattern as that of MtACS1 during ripening, which is unlike the sharp upregulation of ACS1 in FJ and BXJ. This result indicated that MtERF11 suppressed the expression of MtACS1 throughout the fruit development process. The promoter sequence of MtACS1 harbour GCC-boxes, while that of MtACO1 lacks GCC-boxes, which may explain the upregulation of MtACO1 during ripening (Additional file 1: Fig. S14). Analysis of the cloned promoter sequences of MtACO1 also validated the missing of GCC-boxes. As a downstream gene of the carotenoid synthesis pathway and a gene that catalyses the first step of ABA biosynthesis, MtNCED6 may play key roles in the regulation of ABA synthesis in karat fruit, as this gene is specifically expressed during ripening, unlike in FJ and BXJ [40]. Moreover, unlike in FJ and BXJ, the metabolism-related gene MtCYP707A1 was not upregulated during ripening. It may be the key gene involved in the decrease in ABA content in BXJ during ripening. MtCYP707A2 was upregulated at 208 DAF, which may explain the decrease in ABA levels (Additional file 1: Fig. S15-17). ABA-stress-ripening (ASR) transcription factors play key roles in sucrose- and ABA-induced fruit ripening and softening via crosstalk between ABA and sucrose [41]. In BXJ, MaASR1 and MaASR2 increased before ripening but sharply decreased, while they were highly expressed during karat fruit ripening. MtASR1 expression was maintained at a high level throughout the ripening period, indicating that this gene may play a role in karat fruit ripening and softening.
As starch was degraded and converted into soluble sugars during ripening, karat had lower glucose and fructose contents but higher sucrose and free galactose contents than BXJ, due to non-climacteric behaviour. Quantification of sugars in the pulp of ripening karat showed that sucrose was highly enriched, coinciding with the degradation of starch. In the pulp of dry ripened fruit, the sucrose content reached 233.96 mg/g and was the predominant sugar (Fig. 6d and Additional file 2: Data 9). Given that the pulp of the ripening fruit is 67% water [4], karat pulp has approximately 27.13 and 24.95 mg/g glucose and fructose, respectively, which is less than the approximately 60 mg/g glucose and fructose found in BXJ [42]. During ripening, the starch-degradation-related genes MtBMYs and MtAMYs were upregulated, coinciding with the accumulation of sucrose, fructose and glucose. However, in contrast to that of their homologues in BXJ and FJ, the expression of these genes did not sharply increase in karat (Fig. 6e and Additional file 1: Fig. S18). Sucrose synthase genes were highly expressed in karat during ripening, which differed from the decrease in BXJ and FJ. Moreover, the level of free galactose in the pulp at ripening was 0.27 mg/g. The galactose synthesis-related genes Mtα-GALs and Mtβ-GALs had higher expression levels than those in BXJ and FJ, which may be due to the non-climacteric behaviour. GALK is involved in galactose metabolism [43]. Compared to MaGALK, MtGALK is a 5′ prime untranslated region (UTR) premature start codon-gain variant that has two premature ORFs in the 5′ UTR (Fig. 6a–c). Premature ORFs may suppress the translational activity of transcripts [44]. MtGALK also has an alternative transcript in which the sixth exon is missing, which may be due to a DNA sequence insertion in the sixth intron (Fig. 6c). The cloned MtGALK cDNA sequences validated the variation of in 5′ UTR and alternative splice. Moreover, the karat shoot buds presented better growth vigour than BXJ under exogenous application free galactose (Additional file 1: Fig. S19). Therefore, the free galactose content in the pulp was higher than that in the pulp of other banana fruits and may be due to sequence variations in the 5’ UTR and sixth intron of MtGALK. Non-climacteric ripening behaviour may be another reason for the accumulation of free galactose; similar to the fruit of non-climacteric plum cultivars, which shows increased expression levels of α-GALs and β-GALs and increased accumulation of free galactose [45].
Using weighted gene coexpression network analysis (WGCNA), we constructed a weighted coexpression network. To identify candidate genes involved in fruit development and ripening, flavonoid and carotenoid biosynthesis, we selected MtXB3, MtNAC1, MtCCD4s, MtERFs, MtNCED6, and MtDFR to extract subnetwork from coexpression network (Fig. 5d and Additional file 2: Data 10). MtXB3, a homologue of MaXB3 that negatively regulates fruit ripening, was downregulated at 130 DAF. MtXB3 was coexpressed with MtTT8, MtLc, MtMYB114, MtDFR, and MtANR, which may play key roles in flavonoid biosynthesis. MtERF11 and MtACS1 were coexpressed with MtSAP1, MtSAP4, MtERF4, MtERF105, MtERF107 and MtERF109. MtNCED6 was coregulated with MtRIBA1, MtACO1, MtSAUR1, MtCCD4a and MtCCD4b. And MtCCD4a and MtCCD4b negatively regulate the accumulation of α-carotene and β-carotene and were coregulated with MtJAZ1 and MtETO1.