- Open Access
Genome sequencing of the staple food crop white Guinea yam enables the development of a molecular marker for sex determination
- Muluneh Tamiru†1,
- Satoshi Natsume†1,
- Hiroki Takagi†1,
- Benjamen White†2,
- Hiroki Yaegashi†1,
- Motoki Shimizu†1,
- Kentaro Yoshida3,
- Aiko Uemura1,
- Kaori Oikawa1,
- Akira Abe1,
- Naoya Urasaki4,
- Hideo Matsumura5,
- Pachakkil Babil6,
- Shinsuke Yamanaka7,
- Ryo Matsumoto7,
- Satoru Muranaka7,
- Gezahegn Girma8,
- Antonio Lopez-Montes8,
- Melaku Gedil8,
- Ranjana Bhattacharjee8,
- Michael Abberton8,
- P. Lava Kumar8,
- Ismail Rabbi8,
- Mai Tsujimura9,
- Toru Terachi9,
- Wilfried Haerty2,
- Manuel Corpas2,
- Sophien Kamoun10,
- Günter Kahl^11,
- Hiroko Takagi7Email author,
- Robert Asiedu8Email author and
- Ryohei Terauchi1, 12Email author
© The Author(s). 2017
Received: 3 May 2017
Accepted: 10 August 2017
Published: 19 September 2017
Root and tuber crops are a major food source in tropical Africa. Among these crops are several species in the monocotyledonous genus Dioscorea collectively known as yam, a staple tuber crop that contributes enormously to the subsistence and socio-cultural lives of millions of people, principally in West and Central Africa. Yam cultivation is constrained by several factors, and yam can be considered a neglected “orphan” crop that would benefit from crop improvement efforts. However, the lack of genetic and genomic tools has impeded the improvement of this staple crop.
To accelerate marker-assisted breeding of yam, we performed genome analysis of white Guinea yam (Dioscorea rotundata) and assembled a 594-Mb genome, 76.4% of which was distributed among 21 linkage groups. In total, we predicted 26,198 genes. Phylogenetic analyses with 2381 conserved genes revealed that Dioscorea is a unique lineage of monocotyledons distinct from the Poales (rice), Arecales (palm), and Zingiberales (banana). The entire Dioscorea genus is characterized by the occurrence of separate male and female plants (dioecy), a feature that has limited efficient yam breeding. To infer the genetics of sex determination, we performed whole-genome resequencing of bulked segregants (quantitative trait locus sequencing [QTL-seq]) in F1 progeny segregating for male and female plants and identified a genomic region associated with female heterogametic (male = ZZ, female = ZW) sex determination. We further delineated the W locus and used it to develop a molecular marker for sex identification of Guinea yam plants at the seedling stage.
Guinea yam belongs to a unique and highly differentiated clade of monocotyledons. The genome analyses and sex-linked marker development performed in this study should greatly accelerate marker-assisted breeding of Guinea yam. In addition, our QTL-seq approach can be utilized in genetic studies of other outcrossing crops and organisms with highly heterozygous genomes. Genomic analysis of orphan crops such as yam promotes efforts to improve food security and the sustainability of tropical agriculture.
Yam is a collective name for tuber-bearing crops belonging to the monocotyledonous Dioscorea genus in the family Dioscoreaceae of the order Dioscoreales. This genus contains approximately 450 species which are primarily distributed in tropical and subtropical regions worldwide . Among the Dioscoreaceae, three minor genera are monoecious (having male and female flowers on a plant), but the entire genus Dioscorea is characterized by dioecy (the presence of separate male and female plants), a feature shared by only 5–6% of angiosperms . The origin of Dioscorea is supposed to be in the Late Cretaceous (~80 Mya ), suggesting that the origin of dioecy dates back to this time. Approximately 10 Dioscorea species have been independently domesticated in West Africa, Southeast Asia, and the Pacific and Caribbean islands . D. rotundata is the most popular species in West and Central Africa, the main region for yam production worldwide, which contributed approximately 96% of the 63 million tons of yam produced globally in 2013 (Additional file 1: Table S1 and Additional file 2: Figure S1). D. rotundata (white Guinea yam) and D. cayenensis (yellow Guinea yam) represent a major source of food and income in this region, as well as an integral part of the socio-cultural life. This geographical region is often referred to as the “civilization of the yam,” reflecting the West African societies that are tightly linked to yam cultivation [5, 6].
Despite its considerable regional importance, Guinea yam has long been regarded as an “orphan” crop, as it is not traded around the world, and it has attracted little attention from researchers and little investment. Guinea yam cultivation is constrained by several factors. Seeds are seldom used as starting materials; instead, yams are commonly propagated clonally using small whole tubers (referred to as “seed yams”) or tuber pieces. Yam is an annual climber that requires stakes for support and is highly vulnerable to a plethora of pests and diseases. Therefore, an understanding of yam genetics and a systematic improvement of yam based on crossbreeding for traits associated with tuber yield and quality, a reduced requirement for staking, and resistance/tolerance to disease and nematodes are urgently needed. Genetic analysis of Dioscorea has been constrained by the small number of available genetic markers. Furthermore, Dioscorea cultivars are highly heterozygous due to their obligate outcrossing. This heterozygosity renders genetic analysis approaches commonly used in inbreeding species, e.g., linkage analysis using the segregating progeny of an F2 generation and recombinant inbred lines (RILs), inapplicable to yam.
The International Institute of Tropical Agriculture (IITA) has a global mandate for yam research and development within the CGIAR Consortium . We initiated a yam genomics program several years ago as part of an IITA-coordinated international collaboration. To generate genetic and genomic tools for yam breeding, we sequenced and assembled a highly heterozygous diploid genome of D. rotundata. We used this genome sequence and genetic resources to identify a locus associated with sex determination, which we used to develop a diagnostic marker for sex identification at the seedling stage. These genomic resources broaden our knowledge of Guinea yam genetics and provide a platform for implementing genomics-assisted breeding by marker-assisted selection (MAS) in this important staple crop.
Whole-genome sequencing (WGS) and assembly
Characteristics of nuclear genome sequence in Dioscorea rotundata and other angiosperms
D. rotundata (v0.1)
A. thaliana (TAIR10)
B. distachyon (v3.1)
O. sativa (v7_JGI 323)
Total length (Mbp)
Number of scaffolds (≥ 0 bp)
Number of scaffolds (≥ 1000 bp)
Largest scaffold (Mbp)
Number of Ns per 100 kb
Number of genes
Average number per gene
Total length (Mbp)
Average size (bp)
Average GC (%)
Average number per gene
Total length (Mbp)
Average size (bp)
Average GC (%)
% Total interspersed
Total interspersed total length (Mbp)
% Short interspersed nuclear elements (SINEs)
SINEs total length (Mbp)
% Long interspersed nuclear elements (LINEs)
LINEs total length (Mbp)
% Long terminal repeat (LTR) elements
LTR elements total length (Mbp)
% DNA elements
DNA elements total length (Mbp)
Unclassified total length (Mbp)
We assessed the quality of our assembly by investigating the presence of 248 highly conserved core eukaryotic genes with the Core Eukaryotic Genes Mapping Approach (CEGMA)  and confirmed the presence of 243 (98%) of those genes (Additional file 1: Table S4). Similarly, 94% of 956 Benchmarking Universal Single-Copy Orthologs (BUSCOs)  were present in at least one complete single copy in the assembly (Additional file 1: Table S5). Since the TDr96_F1 reference genome was generated from total genomic DNA, it also contained organelle-derived sequences. Alignment of the TDr96_F1 PE reads to the published D. rotundata chloroplast genome sequence  showed that 14.7% of the total PE reads were derived from the chloroplast genome (Additional file 1: Table S6). We also isolated mitochondrial DNA from TDr96_F1 leaves, sequenced this DNA using PE reads with Illumina MiSeq, and generated a 564-kb de novo assembly comprising 76 scaffolds (Additional file 2: Figure S5). Among PE reads, 1.25% represented mitochondrial sequences.
Generation of pseudo-chromosomes by anchoring scaffolds onto a linkage map
Guinea yam gene prediction and comparative genomics
We predicted genes and transposons using the TDr96_F1 reference genome sequence. To construct reliable gene models, we followed the MAKER pipeline using RNA-seq data from 18 samples representing various D. rotundata tissues (Additional file 1: Tables S12, S13) and combined the data with publicly available expressed sequence tags (ESTs) and homologous protein sequences from related angiosperm species (Additional file 2: Figure S14). This resulted in the prediction of 26,198 genes (Table 1 and Additional file 3), 22,477 (85.8%) of which are supported by RNA-seq data.
We compared Guinea yam genome sequence metrics with those of Arabidopsis thaliana (dicot), Brachypodium distachyon (monocot), and Oryza sativa (monocot) (Table 1). Interestingly, the GC contents of the total genome and exons of protein-coding genes in Guinea yam were 35.8% and 44.1%, respectively; these values are close to those of Arabidopsis and much lower than those of the Poales species Brachypodium and Oryza (Table 1). We annotated an average of 6.03 exons and 4.03 introns per gene. Roughly half of the genome was represented by an interspersed sequence (274.5 Mb), a major component of which was long terminal repeat (LTR) sequences (135.7 Mb) (Table 1).
For 12,625 D. rotundata genes, no orthologs or paralogs were found in B. distachyon, O. sativa, or A. thaliana, and 11,348 D. rotundata genes had no clear homologs in any of the six species shown in Fig. 3a and Additional file 8. Of these 11,348 genes without homologs, 3422 were expressed in tuber tissues, a tissue type not shared with the other species examined.
Non-redundant Gene Ontology (GO) terms “intracellular organelle”, “protein binding”, and “ion binding” were significantly enriched among D. rotundata genes that showed no orthology to the other species, but not among the conserved genes (Additional files 9, 10). D. rotundata genes without orthologs in the other species included 68 genes encoding proteins with lectin domains that are involved in defense against microbial pathogens, nematodes, and insects, accounting for 31% of the 216 lectin-coding genes functionally annotated in D. rotundata. Among the 12 subfamilies of lectins , the bulb-type lectin (snowdrop lectin; B-lectin) family contributed the largest share (110) of genes in D. rotundata (Additional file 1: Table S14). Phylogenetic analysis of the B-lectin genes in D. rotundata (110 genes; 51 unique), B. distachyon, O. sativa, and A. thaliana revealed two expansions of B-lectin genes in Dioscorea (Fig. 3c). The first expansion (blue band) consisted of 22 receptor-like serine/threonine-protein kinases, which are thought to play a role in signaling and the activation of plant defense mechanisms . The second expansion (red band) consisted of 28 mannose-binding lectins sharing high similarity with Dioscorea batatas tuber lectin DB1 (accession number AB178475). DB1 has insecticidal properties against cotton bollworm (Helicoverpa armigera), and studies in transgenic tobacco and rice plants expressing DB1 demonstrated that it also confers resistance against green peach aphid and brown plant hopper, respectively [22–24]. Of these mannose-binding lectin genes in Guinea yam, 16 did not have orthologs in any of the six other species examined, and two showed enriched expression (Benjamini–Hochberg  adjusted P value [padj] < 0.05) in tubers.
RNA-seq analysis comparing three tuber tissues to all other nine tissues (Additional file 1: Table S12) revealed that 2023 genes were enriched in tubers. The top 50 highly expressed (padj < 0.05) genes included genes encoding starch synthases and branching enzymes, as well as three carbonic anhydrase-encoding genes. Basic Local Alignment Search Tool (BLASTP) (https://blast.ncbi.nlm.nih.gov) analysis showed that these carbonic anhydrase-encoding genes shared high identity (average 76%) with genes encoding Dioscorea japonica precursors of dioscorin, a tuber storage protein that has carbonic anhydrase activity and exists in multiple isoforms  (Additional file 11).
To infer the past genome duplication in D. rotundata, we performed genome-wide dot plot analysis of D. rotundata against itself (Additional file 2: Figure S15), which revealed no indication of genome duplication. Nevertheless, we observed 946 paralogous gene clusters composed of duplicated genes in D. rotundata. Of these, 145 duplicate clusters of paralogous genes were observed only in D. rotundata. To investigate macrosynteny between D. rotundata and related species, we carried out whole-genome syntenic dot plot analysis against the genomes of Oryza sativa, Spirodela polyrhiza, and Phoenix dactylifera. At the chromosomal level, it was difficult to observe synteny conservation between these species. To assess microsynteny conservation, we performed a syntenic path assembly  of the scaffolds from these species against D. rotundata-masked pseudo-chromosomes (see Methods). The reordering and reorientation of the scaffolds relative to D. rotundata pseudo-molecules identified large proportions of the genomes to be conserved at the microsyntenic level (Additional file 2: Figure S16). This suggested that the D. rotundata genome has undergone many recombination events after its divergence from the other species.
Whole-genome resequencing of F1 bulk segregants identifies a genomic region associated with sex determination in D. rotundata
We hypothesized that the ZZ genotype stably gives rise to the male phenotype, whereas the ZW genotype results in unstable sex phenotypes; ZW mainly generates the female phenotype, but sometimes monoecious or male phenotypes depending on the environments. Therefore, some individuals of the F1 progeny derived from a cross between P3 and P4 might have been scored as male despite their genotype being ZW, which may have obscured our analysis, resulting in non-zero depth of male DNA bulk within the putative W-region (Fig. 6b). To address this possibility, we selected 50 ZZ plants from the F1 progeny based on their sp16 genotype and bulked and sequenced the DNA (sp16-minus bulk). The sp16-minus bulk reads, as well as female bulk reads, were separately mapped to the combined sequence of the TDr96_F1 reference genome and P4-DDN to identify the female-specific TDr96_F1 genomic region, as described in Fig. 6a. As shown in Fig. 6c and Additional file 2: Figure S20c, d, we successfully delineated the putative W-linked region mapped predominantly with female-only bulk DNA, representing an approximately 161-kb region of scaffold206 on pseudo-chromosome 11. This putative female-specific W-linked region contains ~ 10 predicted genes (Additional file 12).
Molecular markers, such as simple sequence repeats (SSRs), indels, and SNPs, can, for the first time, be developed for various applications in Guinea yam, including linkage mapping, genome-wide association analysis, genomic selection, and MAS. We have already analyzed sequences containing SSR motifs in the genome and identified more than 22,000 candidates that can be used to design primers (Additional file 1: Table S19). We designed primer pairs for 1000 of these sequences and obtained the information necessary for their immediate use in genetic analyses (Additional file 13). SSR markers isolated from one Dioscorea species can be transferred to other species . From a practical plant breeding point of view, the sp16 sex-linked marker should prove useful for selecting plantlets for crossing, substantially saving the space and labor required to grow plants and accelerating breeding programs. However, the sex-determination system may vary among Dioscorea species (see below), so the transferability of sex-linked DNA markers from D. rotundata to other species should be addressed in future studies.
Our identification of the locus underpinning an important trait by QTL-seq, using F1 progeny derived from highly heterozygous parents, opens up new avenues to WGS-based mapping of important traits in crops and tree species for which inbred lines are difficult to obtain and/or generation times are too long, impeding the use of conventional linkage analysis approaches.
Development of DNA markers linked to agronomically important traits and their use for MAS increase the role yam plays in ensuring food security for resource-poor households in Africa and beyond. The D. rotundata genome sequences reported here should also contribute to understanding the origin of Guinea yam and its domestication from its wild progenitor species, which are widely distributed in West and Central Africa.
Our results suggest that the Guinea yam sex-determination system involves female heterogamy (male = ZZ, female = ZW). We identified two DNA markers, sp1 (linked to the putative Z-linked region) and sp16 (presumably located within the putative W-linked region, which in TDr96_F1 is presumed to be ZW, and spans only 161 kb). The chromosomes carrying the Z- and W-linked regions are probably not strongly differentiated, and diverged sequences corresponding to Z and W chromosomes were not recovered in our reference genome. Future work should test for structural differences, such as inversions, between the Z- and W-linked regions. Guinea yam sex determination is not, however, a simple genetic system. The consistent maleness of individuals with the ZZ genotype, based on the sp16 sequence, versus occasional maleness of ZW individuals, suggests that maleness is the default phenotype and that the W allele is dominant over Z and can, but does not always, suppress male organ development and feminize the flower. If the feminizing function of the W allele fails in a subset of flowers, the individual will be monoecious. ZW individuals can change sex over time (Fig. 7), indicating that the Z-suppressing function can be affected by the environment. Self-pollination between male and female flowers of ZW monoecious plants could become possible, which may allow inbred lines to be generated, allowing fixation of desired alleles of agronomically important traits. To make it practical, though, we may have to carefully monitor the level of inbreeding depression in D. rotundata. Dioecy is the norm in Dioscorea species, and previous reports suggest that males are usually the heterogametic sex (XY) in the genus [31, 32]. A genetic study of D. tokoro also confirmed an XY male system . D. tokoro belongs to the section Stenophora, which is distantly related to the section Enantiophyllum, which contains D. rotundata . Our data suggest that the sex-determination system has changed within the genus during the evolution, which could be an interesting topic for future studies. Once the D. rotundata sex-determination gene has been isolated, its comparison with another dioecious monocot species such as Asparagus, for which the sex-determination gene has been recently isolated , would be interesting.
Here, we sequenced the whole genome (594 Mb) of the dioecious tuber crop Guinea yam (Dioscorea rotundata) using a heterozygous individual and anchored the scaffolds to 21 linkage groups to generate pseudo-chromosomes. We exploited the genome sequence to map the sex-determination locus by QTL-seq using BSA of F1 progeny. This analysis revealed a genomic region on pseudo-chromosome 11 tightly linked to femaleness within a female heterogametic (ZZ = male, ZW = female) sex-determination system. This genome sequence will serve as a springboard towards gene mapping and discovery in yam (Dioscorea spp.) and genetic improvement of these important yet neglected staple crops.
The TDr96_F1 line used for WGS was selected from F1 progeny obtained from an open-pollinated D. rotundata breeding line (TDr96/00629) grown under field conditions in the experimental fields of the International Institute of Tropical Agriculture (IITA) in Nigeria. F1 seeds from TDr96/00629 and those obtained from the cross between the parental lines TDr97/00917 and TDr99/02627 used for RAD-seq were germinated on wet paper towels in darkness at 28 °C. After germination, the seeds were transferred to soil (Sakata Supermix A ) and grown at 30 °C with a 16-h/8-h photoperiod in a greenhouse at Iwate Biotechnology Research Institute (IBRC) in Japan. Fresh leaf samples were collected for DNA extraction. Additionally, to resequence the F1 progeny used for QTL-seq analysis, lyophilized leaf samples obtained from plants that were grown and phenotyped under field conditions at IITA were used for DNA extraction.
Determination of chromosome number and ploidy level
For chromosome observation, root tips of TDr96_F1 plants generated by in vitro propagation of nodal explants were sampled and fixed in acetic acid-alcohol (1:3 ratio) for 24 h without pretreatment. Fixed root tips were stained with a 1% aceto-carmine solution for 24 h. Samples were prepared by the squash method and analyzed under an Olympus BX50 optical microscope (Olympus Optical Co, Ltd., Tokyo, Japan ) at 400× magnification.
Estimation of D. rotundata genome size
For WGS, genomic DNA was extracted from fresh TDr96_F1 leaf samples using a NucleoSpin Plant II Kit according to the manufacturer’s protocol (Macherey-Nagel GmbH & Co. KG ) with slight modifications. Homogenized samples were washed with 0.1 M 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES) buffer to remove contaminating polysaccharides. Just before use, 120 mg polyvinylpyrrolidone (PVP), 90 mg l-ascorbic acid, and 200 μl 2-mercaptoethanol (ME) were added to 10 ml HEPES buffer, and 1 ml of the mixture was used to wash each sample; washing was repeated three times. Additionally, 10 μl 2-ME and 5 μl of 30% polyethylene glycol (PEG)-20000 were added to 1 ml of PL1 buffer (provided with the NucleoSpin Plant II Kit), and twice the recommended volume of buffer (800 μl) was used for cell lysis. Libraries for PE short reads and MP jump reads of various insert sizes including 2, 3, 4, 5, 6, and 8 kb were constructed using an Illumina TruSeq DNA LT Sample Prep Kit and a Nextera Mate Pair Sample Prep Kit, respectively. The PE library was sequenced on the Illumina MiSeq platform, while the MP libraries were sequenced on the HiSeq 2500 platform. Library construction and sequencing of the 20- and 40-kb MP jump sequences were carried out by Eurofins Genomics (Operon ) and Lucigen , respectively. The 20-kb and 40-kb jump libraries were sequenced on the MiSeq and HiSeq 2500 platforms, respectively. BAC libraries were constructed by Lucigen, and BAC-end sequencing was carried out by Genaris  using Sanger sequencing. A total of 30,750 clones corresponding to 3072 Mb of sequence and 5.4× genome coverage were constructed. Of these, 9984 clones were used for BAC-end sequencing, generating a 13.6-Mb sequence in PE fasta format, which was converted to 50-bp PE short reads corresponding to a 0.46-Gb sequence and ~ 0.8× coverage of the estimated 470-Mb D. rotundata genome (Additional file 1: Table S2 and Additional file 2: Figure S2).
De novo assembly
All TDr96_F1 sequence reads in fastq format were filtered for quality using the FASTX-Toolkit version 0.0.13 . For PE reads and MP short jump reads with insert sizes ranging from 2 to 8 kb, only those having sequence reads with a Phred quality score of ≥ 30 (i.e., ≥ 90% of the reads) were retained. Adapter trimming and removal of MP reads with the wrong insert sizes were performed using an in-house pipeline of scripts written in Perl and C++. Quality filtering of the long jump sequences (20-, 40-, and 100-kb insert sizes) was carried out by the suppliers. For the initial de novo assembly, short PE reads and MP jump reads with 2- to 40-kb insert sizes (Additional file 1: Table S2) were assembled using ALLPATHS-LG assembler version R49856 . Further scaffolding of the assembly generated by ALLPATHS-LG was performed using the 100-kb jump MP fastq reads obtained by BAC-end sequencing and the SSPACE PREMIUM 2.3 scaffolding tool with default parameters .
Constructing organelle genome sequences
De novo assembly of the D. rotundata mitochondrial genome sequence was performed using mitochondrial DNA isolated from TDr96_F1 leaf samples according to the method of Terachi and Tsunewaki  with the following minor modifications. Fresh green leaves (ca. 150 g) were homogenized in 1.5 L of homogenization buffer containing 0.44 M mannitol, 50 mM Tris-HCl (pH 8.0), 3 mM ethylenediaminetetraacetic acid (EDTA), 5 mM 2-ME, 0.1% (w/v) bovine serum albumin, and 0.1% (w/v) PVP. Following DNaseI treatment, the mitochondrial fraction was collected from the interface between 1.30 M and 1.45 M of a sucrose gradient. Mitochondrial DNA was purified by EtBr/CsCl centrifugation at 80,000 rpm for 6 h at 20 °C in a Beckman TLA 100.3 rotor. The DNA band was collected and purified by ethanol precipitation. The resulting mitochondrial DNA (15 ng) was amplified using a REPLI-g Mini Kit (Qiagen, Cat. no. 150023) and used for library construction. The library was sequenced on an Illumina MiSeq sequencer, and the resulting PE reads were assembled de novo using DISCOVAR De Novo , generating D. rotundata mitochondria contigs. For scaffolding, MP reads with insert sizes of 2, 3, 4, 5, 6, 8, and 20 kb obtained from D. rotundata genomic DNA (gDNA) were aligned to the D. rotundata mitochondrial contigs. MP reads showing 100% alignment were selected and used for scaffolding of D. rotundata mitochondrial contigs by SSPACE  (Additional file 2: Figure S5). To reconstruct the D. rotundata chloroplast genome sequence, the PE reads of TDr96_F1 were aligned to the recently published D. rotundata chloroplast genome sequence  (GenBank ID = NC_024170.1) by Burrows-Wheeler alignment (BWA) , and chloroplast-derived sequences were identified, amounting to 5,403,420 reads (14.74% of the total size of PE reads generated for TDr96_F1 [Table 1]) matching the assembled 155.4-kb chloroplast genome of D. rotundata.
Evaluation of the completeness of the genomic assembly
To evaluate the completeness of the D. rotundata genome assembly, the assembly was checked for the presence of 248 highly conserved core eukaryotic genes  using CEGMA version 2.4 with default parameters  (Additional file 1: Table S4). To further assess the completeness of the genome, the successor to CEGMA, Benchmarking Universal Single-Copy Orthologs (BUSCO), was used to check for the presence of 956 BUSCOs with version 1.1.b1  using the early access plant dataset (Additional file 1: Table S5).
Annotation of transposable elements (TEs)
Legacy repetitive sequences, including transposons, were predicted using CENSOR 4.2.29  with the following options: show_simple, nofilter, and mode rough using the Munich Information Center for Protein Sequences (MIPS) Repeat Element Database . Following identification, the repeat elements were classified using mips-REcat . Repetitive sequences were later improved by remodeling using RepeatModeler 1.0.8  and masked with RepeatMasker 4.0.5 . Using the National Center for Biotechnology Information (NCBI) database, one of three other options was used to generate interspersed RepeatModeler-based, interspersed Rebase-based, and Low complexity repeats: “nolow”, “nolow, species Viridiplantae”, and “noint”, respectively. Repeat element content and other statistics were compared between the D. rotundata and A. thaliana TAIR10 , B. distachyon v3.1 , and O. sativa v7_JGI 323  genomes using the RepeatModeled and RepeatMasked references (Table 1).
Total RNA was extracted using leaf, stem, flower, and tuber samples collected from a greenhouse-grown TDr96_F1 plant using a Plant RNeasy Kit (Qiagen ) with slight modifications. RLC buffer was used for lysis after the addition of 5 μl 30% PEG-20000 and 10 μl 2-ME to 1 ml of buffer. The RNA samples were treated with DNase (Qiagen) to remove contaminating genomic DNA. Two micrograms of total RNA was used to construct complementary DNA (cDNA) libraries using a TruSeq RNA Sample Prep Kit V2 (Illumina) according to the manufacturer’s instructions. The libraries were used for PE sequencing using 2× 100 cycles on the HiSeq 2500 platform in high-output mode. Illumina sequencing reads were filtered by Phred quality score, and reads with a quality score of ≥ 30 (≥90% of reads) were retained (Additional file 1: Table S12). Only one RNA-seq experiment was carried out per tissue/organ (indicated as sample in Additional file 1: Table S12).
Prediction of protein-coding genes
The legacy gene models were generated previously using the legacy repeat-masked reference genome and three approaches: ab initio, ab initio supported by evidence-based prediction, and evidence-based prediction. The ab initio prediction was carried out with FGENESH 3.1.1 . The ab initio supported by evidence-based prediction was performed with AUGUSTUS 3.0.3  using the maize5 training set and a hint file as the gene model support information. To construct the hint file, TopHat 2.0.11  was used to align RNA-seq reads from tuber, flower (young), leaf (young), stem, leaf (old), and flower (old) samples to the D. rotundata reference genome, and Cufflinks 2.2.1  was used to generate gene models from these data. The evidence-based predictions using the Program to Assemble Spliced Alignments (PASA)  were generated in a Trinity  assembled transcriptome from the RNA-seq data. JIGSAW 3.2.9  was used to select and combine the gene models obtained using the three approaches with the weighting values assigned to the results from FGENESH, AUGUSTUS, and PASA of 10, 3, and 3, respectively. In total, 21,882 consensus gene models were predicted. These gene models were further improved upon using the MAKER  pipeline (Additional file 2: Figure S14). Publicly available ESTs and protein sequences from related plant species were aligned to the genome using GMAP  and Exonerate 2.2.0 , respectively. De novo and reference-guided transcripts were assembled from RNA-seq data from all 18 tissues using Bowtie 1.1.1 , Trinity 2.0.6 and SAMtools 1.2.0 , and Trinity 2.0.6 and TopHat 2.1.0, respectively. Both sets of assembled transcripts were used to build a comprehensive transcript database using PASA (Additional file 1: Table S13). High-quality non-redundant transcripts from PASA were used to generate a training set for AUGUSTUS 3.1. Gene models were predicted twice using the genome, improved repeat sequences, assembled transcripts, EST and protein alignments, the AUGUSTUS training set, and a legacy set of 21,882 gene models obtained previously using MAKER 2.31.6 , retaining all legacy gene models or querying them with new evidence and discarding those that could not be validated. From both MAKER runs, 21,894 and 76,449 gene models were predicted, respectively. A consensus set of gene models from both MAKER outputs was obtained using JIGSAW 3.2.9  at a 1:1 ratio. In total, 26,198 consensus gene models were predicted in the D. rotundata genome. The corresponding amino acid sequences were also predicted for these gene models. To confirm these gene models, the RNA-seq reads were aligned to the CDSs (coding sequences) of the predicted genes using BWA  with default parameters. Accordingly, 85.8% of the gene models could be aligned by at least a single RNA-seq read. Functional annotation of the amino acid sequences was performed using the in-house pipeline, AnnotF, which compares Blast2GO  and InterProScan  functional terms.
Pairwise orthology relationships were determined with Inparanoid [68–70] using the longest protein-coding isoform for each gene in Arabidopsis thaliana (TAIR10) , Oryza sativa japonica (v7.0) , Brachypodium distachyon (v3.1) , Musa acuminata (v2) , Elaeis guineensis (EG5) , and Phoenix dactylifera (DPV01) . Orthology clusters across all seven species were determined using Multiparanoid . Sequences for the 12 classes of lectins were obtained from UniProt  for the proteomes of A. thaliana (up000006548), B. distachyon (up000008810), and O. sativa (up000059680). Protein alignments for B-lectin class protein sequences from all three of these species and D. rotundata were generated using the program Multiple Alignment using Fast Fourier Transform (MAFFT) . Maximum likelihood trees were constructed based on the concatenated alignments of all 378 B-lectin proteins using RAxML  8.0.2 with 1000 bootstraps. Enrichment of tuber-specific genes was detected using TopHat 2.1.0 to align RNA-seq data from each of the 12 tissues to the genome, with one biological replicate for each tissue. HTSeq 0.6.1  was used to generate raw counts. Then the Bioconductor package DESeq2 1.14.1  was used to compare raw counts of the three tuber tissues against all the other nine tissues (Additional file 1: Table S12) to determine tuber-enriched gene expression based on a log2 fold change > 0 and Benjamini–Hochberg  adjusted P value < 0.05.
Gene enrichment analysis of orthology clusters was performed with GOATOOLS , using the Holm significance test, and the false discovery rate was adjusted using the Benjamini–Hochberg procedure . The list of enriched genes was filtered for redundant Gene Ontology (GO) terms using REVIGO . For the species phylogeny, protein alignments for each gene with a 1:1 orthologous relationship across all monocot species were generated with MAFFT using the longest protein isoform. Maximum likelihood trees were constructed based on the concatenated alignments of 2381 orthologous protein-coding genes using RAxML 8.2.8  with a JTT + Γ model and 1000 bootstraps.
SynMAP  using BLASTZ  alignments, DAGchainer  (options -D 30 and -A 2), and no merging of syntenic blocks were used as part of the CoGe platform  to identify syntenic blocks between the hard-masked pseudo-chromosomes of D. rotundata and scaffolds/contigs of Oryza sativa japonica (A123v1.0), Spirodela polyrhiza (v0.01), and Phoenix dactylifera L. (v3). A syntenic path assembly was then carried out on each of the same three species in SynMap using synteny between the scaffolds/contigs against D. rotundata pseudo-molecules. The syntenic path assembly is a reference-guided assembly that uses the synteny between two species to order and orientate contigs. This approach highlights regions of conservation that were otherwise too shuffled to be clearly observed. Self-self synteny analysis of D. rotundata pseudo-chromosomes was carried out using SynMap Last alignments with default parameters and syntenic gene pair synonymous rate change calculated by CodeML .
RAD-based linkage mapping and scaffold anchoring
RAD-seq was performed as previously described  with a minor modification. Genomic DNA was digested with the restriction enzymes PacI and NlaIII to prepare libraries used to generate PE reads by Illumina HiSeq 2500 (Additional file 2: Figure S6). Approximately 822.7-Mb and 250.4-Mb sequence reads covering 22.9% and 5.3% of the estimated 504-Mb D. rotundata genome sequence, excluding gap regions, at average depths of 7.2× and 9.8× were generated for the parental lines and F1 individuals, respectively (Additional file 2: Figure S7).
Library preparation and sequencing
For library construction, 1 μg DNA obtained from the two parental lines (TDr97/00917-P1 and TDr99/02627-P2) and the 150 F1 individuals was digested with PacI, which recognizes 5’-TTAATTAA-3’, and a biotinylated adapter-1 was ligated to the digested DNA fragments. The adapter-1-ligated DNA fragments were digested with a second enzyme, NlaIII (5’-CATG-3’). After collecting the biotinylated fragments using streptavidin-coated magnetic beads, adapter-2 was ligated to the NlaIII-digested ends. The adapter-ligated DNA was amplified using primers containing sample-specific index sequences, adapter-1 (F) and adapter-2 (R) sequences, and sequences corresponding to the P7 and P5 primers for Illumina sequencing library preparation (Additional file 2: Figure S6). The PCR products were pooled in equal proportions, purified, and subjected to PE sequencing on the Illumina HiSeq 2500 platform. Detailed information about the primers (P7 and P5) used for Illumina library preparation are given in Additional file 1: Table S20.
Identification of parental line-specific heterozygous markers
RAD-tags were aligned to the D. rotundata reference genome using BWA . The aligned data were converted to SAM/BAM files using SAMtools , and the RAD-tags with mapping quality < 60 or containing insertions/deletions in the alignment data were excluded from analysis. Low mapping positions including those with only a single RAD-tag and a mapping quality score of < 30 were also excluded. SNP-index values  were calculated at all SNP positions. For linkage mapping, two types of heterozygous markers (SNP-type and presence/absence-type) were identified (Additional file 2: Figure S8). The SNP-type heterozygous markers were defined based on SNP-index patterns of the parental line RAD-tags. For example, positions with SNP-index values ranging from 0.2 to 0.8 in P1 but homozygous in P2 with SNP-index values of either 0 or 1 were defined as P1-specific heterozygous SNPs. A similar procedure was followed to identify P2-specific heterozygous SNP markers. The selected markers were filtered using depth information at all positions. To increase the accuracy of the selected markers, their segregation (1:1 ratio) was confirmed in 150 F1 individuals obtained from a cross between P1 and P2. If the segregation ratio was out of the confidence interval (P < 0.05) hypothesized by the binomial distribution, B(n = number of individuals, P = 0.5), the markers were excluded from further analysis. Only one marker was selected per 10-kb interval based on the number of F1 individuals represented and tag coverage. A total of 1105 and 990 P1- and P2-heterozygous SNP markers were selected, respectively (Additional file 1: Table S7).
The presence/absence-type markers were defined based on the alignment depth of parental line RAD-tags. First, genomic positions that could be aligned by RAD-tags from only one of the parental lines were identified. Additionally, aligned tags should be heterozygous for that particular region. Similar to the SNP-type markers, the segregation patterns of candidate presence/absence-type markers in the F1 progeny were confirmed, and only those that segregated at a 1:1 ratio (as confirmed by binomial distribution filter) were retained. In the F1 progeny, positions with sequencing depths of ≥ 3 and 0 were defined as heterozygous and homozygous, respectively. For a given candidate position/marker, if the number of F1 individuals defined as homozygous or heterozygous was less than 120, the marker was excluded from further analysis. Only one heterozygous position was selected as a marker within a given 10-kb interval. In total, 221 and 282 positions were selected as P1- and P2-specific presence/absence-type heterozygous markers, respectively (Additional file 1: Table S7).
To developing parental line-specific linkage maps, P1-Map and P2-Map, recombination fraction (rf) values between all pairs of markers on a given scaffold were calculated for both parents using the recombination pattern of the 150 F1 individuals. To minimize incorrect mapping, scaffolds were divided at positions where rf values exceeded 0.25 from the initial marker position (Additional file 2: Figure S9). Only two flanking (distal) markers per scaffold were selected, corresponding to 477 and 493 P1- and P2-specific markers, respectively. These markers were used to develop P1 and P2 linkage maps according to the pseudo-testcross method  using the backcross model of R/qtl . Due to the use of the pseudo-testcross method, the initial maps contained both the coupling and repulsion-type markers. Consequently, the genetic distance in linkage groups was larger than expected. To avoid the effect of repulsion-type markers when calculating genetic distances, these markers were converted to coupling-type markers. If a marker showed a high logarithm of odds (LOD) score and an rf value > 0.5, it was defined as repulsion type and was therefore converted to the coupling-type genotype. This conversion was carried out gradually by changing the threshold of the LOD score from 10 to 5, and then to 3. After converting all repulsion markers to coupling markers, linkage maps were developed using markers showing LOD score > 3 and rf value < 0.25. Accordingly, a total of 21 and 23 linkage groups, each with a minimum of three markers, were generated for P1- and P2-Maps, respectively (Additional file 1: Table S8 and Additional file 2: Figure S10).
To develop chromosome-scale pseudo-molecules, TDr96_F1 scaffold sequences were anchored onto the two parental-specific linkage maps using the selected RAD markers. To combine the two maps, the number of scaffolds shared between all possible linkage group (LG) pairs corresponding to the two maps was determined (Additional file 2: Figures S11, S12). LG pairs that shared the largest number of scaffolds were combined using the same scaffolds. Each combined LG represented a pseudo-chromosome, which was designated/numbered according to the P1-Map LG designation (see Fig. 2 and Additional file 2: Figure S11). After combining the two maps to construct the pseudo-chromosomes, P1- and P2-specifc scaffolds were ordered according to their original order in their respective LGs. If the order of scaffolds could not be decided because the order was similar in both the P1- and P2-Maps, the order in P1-LG was adopted (Fig. 2). Finally, the ordered scaffolds were connected by 1000 nucleotides of “N” into a single fasta file for each pseudo-chromosome (Additional file 2: Figure S12).
DNA samples obtained from the two parental lines, TDr97/00917 (P3, female) and TDr97/00777 (P4, male), as well as samples pooled in equal amounts from 50 male (male-bulk) and 50 female (female-bulk) F1 individuals obtained from the cross between P3 and P4 were subjected to WGS. Libraries for sequencing were constructed from 1-μg DNA samples with a TruSeq DNA PCR-Free LT Sample Preparation Kit (Illumina) and were sequenced via 76 cycles on the Illumina NextSeq 500 platform. Short reads in which more than 20% of sequenced nucleotides exhibited a Phred quality score of < 20 were excluded from further analysis. To perform QTL-seq analysis of F1 progeny, two types of analyses are required. In the first analysis, the SNP index and ΔSNP index are calculated at P4-specific heterozygous positions. The second analysis is performed using P3-specific heterozygous positions. To identify P4-specific heterozygous positions, the P3 “reference sequence” was first developed by aligning P3 reads to the reference genome sequence of D. rotundata and replacing nucleotides of the D. rotundata reference genome sequence with nucleotides of P3 at all SNP positions showing an SNP index of 1 (Additional file 2: Figure S17c). SNP detection, calculation of SNP index, and replacement of SNPs were carried out via step 2 of QTL-seq pipeline version 1.4.4 . Short reads obtained from both the male and female parents were then aligned to the “reference sequence” and heterozygous SNP positions between the two were extracted. A SNP was defined as heterozygous if the same position showed an SNP-index value ranging from 0.4 to 0.6 in one parent and a value of 0 in the second parent. Of the selected markers/positions, only those having enough depth in both parents (>15) were used for analysis of SNP-index values in the bulk-sequenced samples. P3-specific heterozygous positions were identified similarly using the P4 “reference sequence.”
After identifying P4- and P3-specific heterozygous positions, the Illumina reads from the two bulk-sequenced samples (male and female bulks) were aligned to the reference sequences using BWA  and subjected to Coval filtering  as previously described. When the P3 reference sequence was used for alignment, the SNP-index values were calculated only at all of the P4-specific heterozygous positions. By contrast, when the P4 reference sequence was used for alignment, the SNP-index values were calculated only at the P3-specific heterozygous positions. In both cases, positions with shallow depth (< 6) in either of the two samples were excluded from analysis. The ∆SNP index was calculated by subtracting the SNP-index values of the male bulk from those of the female bulk. To generate confidence intervals of the SNP-index value, an in silico test simulating the application of QTL-seq to DNA bulked from 50 randomly selected F1 individuals was performed as described previously  (Additional file 2: Figure S22). The simulation test was repeated 10,000 times depending on the alignment depth of short reads to generate confidence intervals. These intervals were plotted for all SNP positions analyzed. Finally, sliding window analysis was applied to SNP-index, ∆SNP-index, and confidence interval plots with a 1-Mb window size and a 50-kb increment to generate SNP-index graphs (Additional file 2: Figure S18).
Identification of putative W-region by de novo assembly of female and male parental genomes and mapping of bulked DNA from female and male F1 progeny
DNA samples obtained from the two parental lines, TDr97/00917 (P3, female) and TDr97/00777 (P4, male), were separately subjected to de novo assembly. Libraries for sequencing were prepared with a TruSeq DNA PCR-Free LT Sample Preparation Kit (Illumina) and were sequenced for 251 cycles on the Illumina MiSeq platform. Contigs were generated using the DISCOVAR De Novo assembler , resulting in P3-DDN and P4-DDN, respectively. Separately, whole-genome resequencing of bulked DNA was performed on bulked DNA samples obtained from 50 female F1 (Female-bulk.fastq) and 50 male F1 (Male-bulk.fastq) progeny, all derived from a cross between P3 and P4. Two reference sequences, P3-DDN and P4-DDN, were combined to generate P3-DDN/P4-DDN. Short reads from the female and male bulks were separately mapped to P3-DDN/P4-DDN using the alignment software BWA . After mapping, the MAPQ scores of the aligned reads were obtained. Under our conditions, if a short read was mapped to a unique position of the reference sequence, the MAPQ score was 60, whereas if the read was mapped to multiple positions, MAPQ was < 60. Since two reference sequences (P3-DDN and P4-DDN) were fused to generate P3-DDN/P4-DDN, most genomic regions were represented twice. Therefore, most short reads mapped to two or more positions, leading to a MAPQ score < 60. The reads that mapped to the P3-DDN/P4-DDN with MAPQ = 60 were judged to be located in either P3- or P4-specific genomic regions. After finding these P3- or P4-specific genomic regions, the depth of short reads that covered the regions for Female-bulk.fastq and Male-bulk.fastq, respectively, was evaluated. If the depth of Female-bulk.fastq was high and the depth of Male-bulk.fastq was 0 or close to 0, such genomic regions were retained as putative W-regions (Fig. 6 and Additional file 2: Figure S20).
DNA markers linked to sex
The primer sequences used for amplification of sex-linked markers sp1 and sp16, as well as the control Actin gene fragment (Dr-Actin), were as follows:
PCR primers for sp16 fragment:
PCR primers for Dr-Actin gene fragment:
PCR was performed using the following conditions: 30 cycles of 98 °C for 10 s, 55 °C for 30 s, and 72 °C for 1 min. For CAPS analysis of the sp1 marker, the amplified DNA was digested with EcoRI. All PCR products were electrophoresed on 1.5% agarose gels.
Identification of SSR markers
Approximately 4,932,582 bp simple sequence repeat (SSR) motif-containing sequences were predicted in the D. rotundata genome. Within this region, SSR sequences with enough flanking regions were identified and evaluated for use in primer design. Accordingly, 134,101 SSR-containing sequences, excluding those with single base repeats, were identified. The SSR information for these sequences was analyzed using GMATo  version 1.2 Build 20130106 with the following parameters: m (minimum motif length) = 2, x (maximum motif length) = 10, and r (minimum repeat number) = 10. The necessary information was obtained for 22,164 SSR-containing sequences in the assembled genome, 12,724 (57.4%) of which were anchored to the genetic map (Additional file 1: Table S19). Primer pairs were designed for 1000 of these sequences using Primer3  software release 2.3.6 with the following parameters: product size = 100–500, primer length = 18–22 bp (optimum 20 bp), GC content = 40–60% (optimum 50%), and Tm = 57–63 (optimum = 60.0).
We thank Prof. John Vogel from the DOE Joint Genome Institute (Walnut Creek, CA, USA) for access to the B. distachyon v3.1 genome.
This article is dedicated to the memory of the late Professor Günter Kahl, who contributed enormously to the development of yam research.
This study (design of the study, collection, analysis, and interpretation of data, and manuscript writing) was funded by the Japan International Research Center for Agricultural Sciences (JIRCAS) and the International Institute of Tropical Agriculture (IITA) as a component of an international collaborative research project involving Iwate Biotechnology Research Center (IBRC), JIRCAS, and IITA entitled the “Use of genomic information and molecular tools for yam germplasm utilization and improvement for West Africa (EDITS-Yam).” Computations were partially performed on the National Institute of Genetics (NIG) supercomputer at ROIS (Research Organization of Information and Systems) National Institute of Genetics (Mishima, Shizuoka, Japan). Work performed at The Earlham Institute, Norwich, UK, was supported by strategic Biotechnology and Biological Sciences Research Council (BBSRC) funding (Institute Strategic Programme Grant BB/J004669/1 and iCASE Studentship BB/L017350/1) and by the Norwich Bioscience Institutes (NBI) Computing infrastructure for Science (CiS) group.
Availability of data and materials
The datasets supporting the conclusions of this article are available in the DNA Databank of Japan (DDBJ)/European Molecular Biology Laboratory (EMBL)/GenBank databases.
BioProject accession number: PRJDB3383
TDr96_F1 de novo assembly: DRX025239–DRX025248 (Additional file 1: Table S2)
RNA-seq 2013,2014: DRX040446–DRX040451 (Additional file 1: Table S12)
RNA-seq 2015: DRX057356–DRX057367 (Additional file 1: Table S12)
QTL-seq basic: DRX040452–DRX040455 (Additional file 1: Table S16)
Identification of putative W-region: DRX057354–DRX057355 (Additional file 1: Table S16)
sp16-minus bulk: DRX080879 (Additional file 1: Table S16)
P3 (TDr97/00917): DRX057369 (Additional file 1: Table S17)
P4 (TDr97/00777): DRX057368 (Additional file 1: Table S17)
TDr96_F1 mitochondrial genome: DRX057351 (Additional file 2: Figure S5)
TDr96_F1 reference genome: DF933857–DF938579
TDr96_F1 mitochondrion: LC219374–LC219449
TDr96_F1 Pseudo_Chromosome: BDMI01000001–BDMI01000021
Gene/protein sequences and gff3 annotation files are publicly available at the following URL:
QTL-seq codes used in the work are publicly available at the following URL:
The authors declare that the study observed all local, national, and international guidelines and legislation and the appropriate permissions.
MT performed the sequencing and biological analysis, SN assembled the genome sequence, HT performed linkage analysis, scaffold anchoring, and QTL-seq analysis, BW performed comparative genomics, HY performed linkage analysis and anchoring, MS performed sex linkage analysis, KY performed genetic studies, AU performed DNA sequencing, KO performed DNA sequencing and RNA-seq, AA performed sex linkage analysis, NU performed DNA sequencing, HM performed DNA sequencing, PB performed chromosome counting and FCM analysis, SY performed population studies, RM performed F1 segregation phenotyping of sex, SM performed F1 segregation phenotyping of sex, GG performed RNA-seq, ALM performed plant crosses, MG performed DNA sampling, RB performed DNA sampling, MA performed population studies, PLK performed DNA sampling, IR performed DNA sampling, MT performed mitochondrial extraction, TT performed mitochondrial extraction, WH performed comparative genomics, MC performed comparative genomics, SK analyzed and interpreted the results, GK analyzed and interpreted the results, HT conceived and supervised the entire study, RA conceived and supervised the entire study, and RT conceived and supervised the entire study. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Wilkin P, Scholsb P, Chasea MW, Chayamaritc K, Furnessa CA, Huysmansb S, Rakotonasolod F, et al. A plastid gene phylogeny of the yam genus, Dioscorea: roots, fruits and Madagascar. Syst Bot. 2005;30:736–49.View ArticleGoogle Scholar
- Renner SS. The relative and absolute frequencies of angiosperm sexual systems: dioecy, monoecy, gynodioecy, and an updated online database. Am J Bot. 2014;101:1588–96.View ArticlePubMedGoogle Scholar
- Maurin O, Muasya M, Catalan P, Shongwe EZ, Viruel J, Wilkin P, van der Bank M. Diversification into novel habitats in the Africa clade of Dioscorea (Dioscoreaceae): erect habit and elephant’s foot tubers. BMC Evol Biol. 2016;16:238.View ArticlePubMedPubMed CentralGoogle Scholar
- Lebot V. Tropical root and tuber crops: cassava, sweet potato, yams and aroids (Crop Production Science in Horticulture Series 17). Wallingford: CABI Publishing; 2009. p. 405.Google Scholar
- Coursey DG. The civilizations of the yam: interrelationships of man and yams in Africa and the Indo-Pacific region. Archeol Phys Anthropol Oceania. 1972;7:215–33.Google Scholar
- Ayensu ES, Coursey DG. Guinea yams: the botany, ethnobotany, use and possible future of yams in West Africa. Econ Bot. 1972;26:301–18.View ArticleGoogle Scholar
- International Institute of Tropical Agriculture (IITA). http://www.iita.org. Accessed 1 Aug 2017.
- Scarcelli N, Daïnou O, Agbangla C, Tostain S, Pham JL. Segregation patterns of isozyme loci and microsatellite markers show the diploidy of African yam Dioscorea rotundata (2n = 40). Theor Appl Genet. 2005;111:226–32.View ArticlePubMedGoogle Scholar
- Girma G, Hyma KE, Asiedu R, Mitchell SE, Gedil M, Spillane C. Next-generation sequencing based genotyping, cytometry and phenotyping for understanding diversity and evolution of guinea yams. Theor Appl Genet. 2014;127:1783–94.View ArticlePubMedGoogle Scholar
- Marçais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27:764–70.View ArticlePubMedPubMed CentralGoogle Scholar
- Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A. 2011;108:1513–18.View ArticlePubMedGoogle Scholar
- Earl D, Bradnam K, St John J, Darling A, Lin D, Fass J, et al. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res. 2011;21:2224–41.View ArticlePubMedPubMed CentralGoogle Scholar
- Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 2011;27:578–9.View ArticlePubMedGoogle Scholar
- Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23:1061–7.View ArticlePubMedGoogle Scholar
- Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–12.View ArticlePubMedGoogle Scholar
- Mariac C, Scarcelli N, Pouzadou J, Barnaud A, Billot C, Faye A, et al. Cost-effective enrichment hybridization capture of chloroplast genomes at deep multiplexing levels for population genetics and phylogeography studies. Mol Ecol Resour. 2014;14:1103–13.View ArticlePubMedGoogle Scholar
- Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One. 2008;3:e3376.View ArticlePubMedPubMed CentralGoogle Scholar
- Grattapaglia D, Sederoff R. Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudo-testcross: mapping strategy and RAPD markers. Genetics. 1994;137:1121–37.PubMedPubMed CentralGoogle Scholar
- Terauchi R, Kahl G. Mapping of the Dioscorea tokoro genome: AFLP markers linked to sex. Genome. 1999;42:752–62.View ArticleGoogle Scholar
- Jiang SY, Ma Z, Ramachandran S. Evolutionary history and stress regulation of the lectin superfamily in higher plants. BMC Evol Biol. 2010;10:79.View ArticlePubMedPubMed CentralGoogle Scholar
- Afzal AJ, Wood AJ, Lightfoot DA. Plant receptor-like serine threonine kinases: roles in signaling and plant defense. Mol Plant Microbe Interact. 2008;21:507–17.View ArticlePubMedGoogle Scholar
- Ohizumi Y, Gaidamashvili M, Ohwada S, Matsuda K, Kominami J, Nakamura-Tsuruta S, et al. Mannose-binding lectin from yam (Dioscorea batatas) tubers with insecticidal properties against Helicoverpa armigera (Lepidoptera: Noctuidae). J Agric Food Chem. 2009;57:2896–902.View ArticlePubMedGoogle Scholar
- Kato T, Hori M, Ogawa T, Muramoto K, Toriyama K. Expression of gene for Dioscorea batatas tuber lectin 1 in transgenic tobacco confers resistance to green-peach aphid. Plant Biotechnol. 2010;27:141–5.View ArticleGoogle Scholar
- Yoshimura S, Komatsu M, Kaku K, Hori M, Ogawa T, Muramoto K, et al. Production of transgenic rice plants expressing Dioscorea batatas tuber lectin 1 to confer resistance against brown planthopper. Plant Biotechnol. 2012;29:501–4.View ArticleGoogle Scholar
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B Methodol. 1995;57(1):289–300.Google Scholar
- Xue YL, Miyakawa T, Sawano Y, Tanokura M. Cloning of genes and enzymatic characterizations of novel dioscorin isoforms from Dioscorea japonica. Plant Sci. 2012;183:14–9.View ArticlePubMedGoogle Scholar
- Lyons E, et al. Using genomic sequencing for classical genetics in E. coli K12. PLoS One. 2011;6(2):e16717.View ArticlePubMedPubMed CentralGoogle Scholar
- Takagi H, Abe A, Yoshida K, Kosugi S, Natsume S, Mitsuoka C, et al. QTL-seq: rapid mapping of quantitative trait loci in rice by whole genome resequencing of DNA from two bulked populations. Plant J. 2013;74:174–83.View ArticlePubMedGoogle Scholar
- Love RR, et al. Evaluation of DISCOVAR de novo using a mosquito sample for cost-effective short-read genome assembly. BMC Genomics. 2016;17(1):187.View ArticlePubMedPubMed CentralGoogle Scholar
- Tamiru M, Yamanaka S, Mitsuoka C, Babil P, Takagi H, Lopez-Montes A, et al. Development of genomic simple sequence repeat markers for Yam. Crop Sci. 2015;55:2191–200.View ArticleGoogle Scholar
- Martin FW. Sex ratio and sex determination in Dioscorea. J Heredity. 1966;57:96–9.View ArticleGoogle Scholar
- Terauchi R, Kahl G. Sex determination in Dioscorea tokoro, a wild yam species. In: Ainsworth CC, editor. Sex determination in plants. Oxford: BIOS Scientific Publishers; 1999. p. 163–71.Google Scholar
- Murase K, Shigenobu S, Fujii S, Ueda K, Murata T, et al. MYB transcription factor gene involved in sex determination in Asparagus officinalis. Genes Cells. 2016;22:115–23.View ArticlePubMedGoogle Scholar
- Sakata Seed Co. http://www.sakataseed.co.jp/. Accessed 1 Aug 2017.
- Olympus Co. http://www.olympus-global.com/en/. Accessed 1 Aug 2017.
- International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature. 2005;436:793–800.View ArticleGoogle Scholar
- Beckman Coulter Co. https://www.beckmancoulter.com/. Accessed 1 Aug 2017.
- Macherey-Nagel GmbH & Co. KG. http://www.mn-net.com. Accessed 1 Aug 2017.
- Operon Co. http://www.operon.com/. Accessed 1 Aug 2017.
- Lucigen Co. http://www.lucigen.com/. Accessed 1 Aug 2017.
- Genaris Co. http://genebay.co.jp/. Accessed 11 Sept 2017.
- Hannon laboratory. http://hannonlab.cshl.edu/fastx_toolkit/. Accessed 1 Aug 2017.
- Terachi T, Tsunewaki K. The molecular basis of genetic diversity among cytoplasms of Triticum and Aegilops: 5. Mitochondrial genome diversity among Aegilops species having identical chloroplast genomes. Theor Appl Genet. 1986;73:175–81.View ArticlePubMedGoogle Scholar
- Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589–95.View ArticlePubMedPubMed CentralGoogle Scholar
- Ian Korf Lab. http://korflab.ucdavis.edu/datasets/genome_completeness/. Accessed 1 Aug 2017.
- Kohany O, Gentles AJ, Hankus L, Jurka J. Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics. 2006;7:474.View ArticlePubMedPubMed CentralGoogle Scholar
- Nussbaumer T, Martis MM, Roessner SK, Pfeifer M, Bader KC, Sharma S, et al. MIPS PlantsDB: a database framework for comparative plant genome research. Nucleic Acids Res. 2013;41:D1144–51.View ArticlePubMedGoogle Scholar
- Smit AFA, Hubley R. RepeatModeler. Open-1.0. (2008–2015).Google Scholar
- Smit AFA, Hubley R, Green P. RepeatMasker. Open-4.0. (2013–2015).Google Scholar
- Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, et al. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res. 2012;40:D1202–10.View ArticlePubMedGoogle Scholar
- International Brachypodium Initiative. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature. 2010;463:763–8.View ArticleGoogle Scholar
- Ouyang S, Zhu W, Hamilton J, Lin H, Campbell M, Childs K, et al. The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res. 2007;35:D883–7.View ArticlePubMedGoogle Scholar
- Quiagen Co. https://www.qiagen.com. Accessed 1 Aug 2017.
- Salamov AA, Solovyev VV. Ab initio gene finding in Drosophila genomic DNA. Genome Res. 2000;10:516–22.View ArticlePubMedPubMed CentralGoogle Scholar
- Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 2004;32:W309–12.View ArticlePubMedPubMed CentralGoogle Scholar
- Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14:R36.View ArticlePubMedPubMed CentralGoogle Scholar
- Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7:562–78.View ArticlePubMedPubMed CentralGoogle Scholar
- Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith Jr RK, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31:5654–66.View ArticlePubMedPubMed CentralGoogle Scholar
- Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013;8:1494–512.View ArticlePubMedGoogle Scholar
- Allen JE, Salzberg SL. JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics. 2005;21:3596–603.View ArticlePubMedGoogle Scholar
- Holt C, Yandell M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics. 2011;12:491.View ArticlePubMedPubMed CentralGoogle Scholar
- Wu TD, Watanabe CK. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics. 2005;21:1859–75.View ArticlePubMedGoogle Scholar
- Slater GS, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005;6:31.View ArticlePubMedPubMed CentralGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25.View ArticlePubMedPubMed CentralGoogle Scholar
- Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–93.View ArticlePubMedPubMed CentralGoogle Scholar
- Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21:3674–76.View ArticlePubMedGoogle Scholar
- Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–40.View ArticlePubMedPubMed CentralGoogle Scholar
- Remm M, Storm CE, Sonnhammer EL. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001;314:1041–52.View ArticlePubMedGoogle Scholar
- O'Brien KP, Remm M, Sonnhammer EL. Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 2005;33:D476–80.View ArticlePubMedGoogle Scholar
- Berglund AC, Sjölund E, Östlund G, Sonnhammer EL. InParanoid 6: eukaryotic ortholog clusters with in paralogs. Nucleic Acids Res. 2008;36:D263–66.View ArticlePubMedGoogle Scholar
- Brachypodium distachyon v3.1 DOE-JGI, https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Bdistachyon/. Accessed 11 Sept 2017.
- Droc G, Larivière D, Guignon V, Yahiaoui N, This D, Garsmeur O, et al. The banana genome hub. Database. 2013;2013:bat035.View ArticlePubMedPubMed CentralGoogle Scholar
- Singh R, Ong-Abdullah M, Low ET, Manaf MA, Rosli R, Nookiah R, et al. Oil palm genome sequence reveals divergence of interfertile species in Old and New worlds. Nature. 2013;500:335–39.View ArticlePubMedPubMed CentralGoogle Scholar
- Al-Mssallem IS, Hu S, Zhang X, Lin Q, Liu W, Tan J, et al. Genome sequence of the date palm Phoenix dactylifera L. Nat Commun. 2013;4:2274.View ArticlePubMedPubMed CentralGoogle Scholar
- Alexeyenko A, Tamas I, Liu G, Sonnhammer EL. Automatic clustering of orthologs and in paralogs shared by multiple proteomes. Bioinformatics. 2006;22:e9–15.View ArticlePubMedGoogle Scholar
- The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2017;45(D1):D158–69.View ArticleGoogle Scholar
- Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80.View ArticlePubMedPubMed CentralGoogle Scholar
- Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–3.View ArticlePubMedPubMed CentralGoogle Scholar
- Anders S, Pyl PT, Huber W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 2014;31:166–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550.View ArticlePubMedPubMed CentralGoogle Scholar
- Tang H, Klopfenstein D, Pedersen B, Flick P, Sato K, Ramirez F, et al. GOATOOLS: Tools for Gene Ontology. Zenodo. 2015: http://doi.org/10.5281/zenodo.31628.
- Supek F, Bošnjak M, Škunca N, Šmuc T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One. 2011;6:e21800.View ArticlePubMedPubMed CentralGoogle Scholar
- Lyons E, et al. The value of nonmodel genomes and an example using SynMap within CoGe to dissect the hexaploidy that predates the rosids. Trop Plant Biol. 2008;1(3):181–90.View ArticleGoogle Scholar
- Schwartz S, Kent WJ, Smit A, et al. Human–mouse alignments with BLASTZ. Genome Res. 2003;13(1):103–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Haas BJ, Delcher AL, Wortman JR, Salzberg SL. DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics. 2004;20(18):3643–6.View ArticlePubMedGoogle Scholar
- Lyons E, Freeling M. How to usefully compare homologous plant genes and chromosomes as DNA sequences. Plant J. 2008;53(4):661–73.View ArticlePubMedGoogle Scholar
- Yang Z. PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586–91.View ArticlePubMedGoogle Scholar
- Matsumura H, Miyagi N, Taniai N, Fukushima M, Tarora K, Shudo A, et al. Mapping of the gynoecy in bitter gourd (Momordica charantia) using RAD-seq analysis. PLoS One. 2014;9:e87138.View ArticlePubMedPubMed CentralGoogle Scholar
- Broman KW, Wu H, Sen Ś, Churchill GA. R/qtl: QTL mapping in experimental crosses. Bioinformatics. 2003;19:889–90.View ArticlePubMedGoogle Scholar
- Iwate Biotechnology Research Center. http://genome-e.ibrc.or.jp/home/bioinformatics-team/mutmap. Accessed 1 Aug 2017.
- Kosugi S, Natsume S, Yoshida K, MacLean D, Cano L, Kamoun S, et al. Coval: improving alignment quality and variant calling accuracy for next-generation sequencing data. PLoS One. 2013;8:e75402.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang X, Lu P, Luo Z. GMATo: a novel tool for the identification and analysis of microsatellites in large genomes. Bioinformation. 2013;9:541–44.View ArticlePubMedPubMed CentralGoogle Scholar
- Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, et al. Primer3—new capabilities and interfaces. Nucleic Acids Res. 2012;40:e115.View ArticlePubMedPubMed CentralGoogle Scholar
- FAO. http://faostat3.fao.org. Accessed 1 Aug 2017.
- CEGMA page of Ian Korf Lab. http://korflab.ucdavis.edu/datasets/cegma/#SCT7. Accessed 1 Aug 2017.