Proliferation of Ty3/gypsy-like retrotransposons in hybrid sunflower taxa inferred from phylogenetic data
© Ungerer et al. 2009
Received: 18 February 2009
Accepted: 14 July 2009
Published: 14 July 2009
Skip to main content
© Ungerer et al. 2009
Received: 18 February 2009
Accepted: 14 July 2009
Published: 14 July 2009
Long terminal repeat (LTR) retrotransposons are a class of mobile genetic element capable of autonomous transposition via an RNA intermediate. Their large size and proliferative ability make them important contributors to genome size evolution, especially in plants, where they can reach exceptionally high copy numbers and contribute substantially to variation in genome size even among closely related taxa. Using a phylogenetic approach, we characterize dynamics of proliferation events of Ty3/gypsy-like LTR retrotransposons that led to massive genomic expansion in three Helianthus (sunflower) species of ancient hybrid origin. The three hybrid species are independently derived from the same two parental species, offering a unique opportunity to explore patterns of retrotransposon proliferation in light of reticulate evolutionary events in this species group.
We demonstrate that Ty3/gypsy-like retrotransposons exist as multiple well supported sublineages in both the parental and hybrid derivative species and that the same element sublineage served as the source lineage of proliferation in each hybrid species' genome. This inference is based on patterns of species-specific element numerical abundance within different phylogenetic sublineages as well as through signals of proliferation events present in the distributions of element divergence values. Employing methods to date paralogous sequences within a genome, proliferation events in the hybrid species' genomes are estimated to have occurred approximately 0.5 to 1 million years ago.
Proliferation of the same retrotransposon major sublineage in each hybrid species indicates that similar dynamics of element derepression and amplification likely occurred in each hybrid taxon during their formation. Temporal estimates of these proliferation events suggest an earlier origin for these hybrid species than previously supposed.
The genomes of flowering plants are remarkably variable in nuclear DNA content, with >1,000-fold differences among some taxa [1, 2]. While differences in ploidy and large-scale segmental duplication account for some of this variability, differential accumulation (and loss) of mobile genetic elements, especially the class I transposable elements known as long terminal repeat (LTR) retrotransposons, represents an additional and important process through which genome size can vary between individual plant species [3, 4]. Plant LTR retrotransposons represent ancient lineages that are ubiquitous in plant genomes [5, 6] and can account for >70% of the nuclear DNA of some plant species . Transposition of these elements is via an RNA intermediate, which enables new copies to be synthesized, reverse transcribed and subsequently integrated into host chromosomal DNA. This mode of transposition can result in large-scale genome expansion because each intact and functional element can potentially give rise to numerous daughter copies.
Recent years have witnessed considerable advances in our understanding of LTR retrotransposons. For example, we now know that these elements are tremendously diverse at the sequence level (in plants) with many subgroup lineages existing within the superfamily types Ty1/copia-like (Pseudoviridae) and Ty3/gypsy-like (Metaviridae) [7–12], that proliferation of these elements can rapidly restructure host genomes [13–16], and that their distribution along chromosome arms can be either dispersed or localized [9, 17–19]. Recent work also suggests that LTR retrotransposons (and other major classes of transposable elements) may not exclusively represent selfish or junk DNA as previously supposed [20, 21] but that, on occasion, transposable elements may have indeed played a more substantial role in generating evolutionary novelty [22–33].
Despite recent advances, we still know surprisingly little regarding how and when these elements become active and proliferate in natural populations; the vast majority of elements remain transcriptionally and transpositionally quiescent during normal growth and development. Various forms of environmental and/or genomic stress have been hypothesized to influence activation. For example, hybridization between genetically differentiated populations and/or species is one means through which these elements are thought to become active [34–38] although activation and proliferation following hybridization is not observed universally [39, 40]. Exposure of plants to biotic and abiotic stresses such as bacterial and viral pathogens, phytophathogenic fungal extracts, wounding, protoplast isolation, and cell culture also has been shown to activate some LTR retrotransposons [41, 42]. While biotic and abiotic stressors may represent more universal agents of activation, much of the data supporting this conclusion comes from experiments conducted under unnatural laboratory conditions; the extent to which these same stresses (especially those that occur naturally) have led to activation and proliferation in natural populations remains unknown.
In the current report, we demonstrate that Ty3/gypsy-like LTR retrotransposons in sunflower are considerably heterogeneous at the sequence level but yet the same element sublineage has proliferated independently in each hybrid sunflower species. We demonstrate further that the ages of these proliferation events (and thus a lower bound on the hybrid species' origins) can be estimated by examining particular signatures of proliferation found in the hybrid species. Estimates by this method suggest that the hybrid species may be older than previously suggested.
Source of plant material and summary information for sequences of the rt domain-encoding region (520 bp) of Ty3/Gypsy-like elements
Original collection location
No. of sequences with indels or stop codons
No. of full length sequences without stop codons
Total no. of sequences (no. of sequences in lineage E')
Utah: N37.239, W(-113.358)
Utah: N37.047, W(-112.530)
Utah: N39.744, W(-112.316)
Utah: N37.254, W(-113.343)
Texas: N30.883, W(-102.983)
Signatures of transposable element proliferation in species' genomes also can be characterized through analysis of the distribution of divergence values between pairwise combinations of element sequences [50, 51]. This form of analysis relies on the fact that all daughter copies of transpositionally active elements are identical at the time of insertion but subsequently accumulate mutations independently. Peaks in the distribution of divergence values correspond to episodes of transposable element proliferation, with peaks associated with greater divergence representing more ancient proliferation events and peaks associated with lesser divergence representing more recent events.
Timeframes for proliferation events indicated by these secondary peaks can be explored given that genome-level mutation rates for Helianthus have been estimated. In wild sunflowers, a silent site mutation rate has been estimated at 1.0 × 10-8 substitutions/site/year based on sequence comparisons in a large EST database coupled with fossil calibrations (M. Barker and L. Rieseberg, University of British Columbia, personal communication). It has been suggested, however, that mutation rates for LTR retrotransposons may be approximately twofold higher than silent site mutation rates for protein coding genes . Thus, utilizing a mutation rate of 2.0 × 10-8 to account for elevated sequence evolution of Ty3/gypsy-like retrotransposons, proliferation events indicated by peaks at 0.02 to 0.04 divergence are roughly estimated to have occurred some 0.5 to 1 million years ago. Timeframes for proliferation events indicated by more prominent primary peaks were not estimated because these features were found in both the hybrid and parental species. It is thus inferred that proliferations associated with these features predate the origins of the hybrid taxa.
Despite the ubiquity and abundance of LTR retrotransposons in plant genomes, our understanding of the dynamics of their proliferation and the consequences of proliferation events on host species evolution is surprisingly limited. Sunflower species in the genus Helianthus provide an excellent group for investigating the possible causes and potential consequences of LTR retrotransposon proliferation in an ecological and evolutionary context. In a previous report , we demonstrated that three ancient hybrid sunflower species have independently experienced massive proliferation of Ty3/gypsy-like LTR retrotransposons following their origins. The current study examines the dynamics of these proliferation events in light of the known relationships among the sunflower species investigated and the requisite condition that proliferative elements in the hybrid species are necessarily derived from lineages present in one or both parental species.
As is commonly observed in plant genomes, we found Ty3/gypsy-like retrotransposons to be considerably diverse at the sequence level, with multiple well supported phylogenetic lineages identified. Particular elements that undergo proliferation, however, are expected to be more abundant in a species' genome, and thus more frequently amplified, cloned, and sequenced via the degenerate PCR methodology employed in this study. Consistent with this expectation, element sequences in the same single sublineage were consistently more abundant numerically in each of the three hybrid species' genomes relative to the genomes of the parental species. This pattern is unlikely to have emerged stochastically via PCR drift given that the same pattern was observed for all hybrid taxa. Moreover, the cloning of degenerate PCR amplification products was conducted on pools of five independent PCR reactions per species, further reducing the likelihood of observing this pattern by chance. This pattern also cannot be attributed to variation in primer sequence specificity across the sunflower species because the degenerate primers used in this study were based on aligned amino acid sequences of several plant species (see Methods), with H. annuus (a parental species) as the sunflower representative. Additionally, our interpretation of this phylogenetic signal is corroborated through independent analyses of the frequency spectra of pairwise sequence divergences (Figure 5a–e).
That proliferation of the same sublineage of Ty3/gypsy-like retrotransposon has occurred independently in each of the three hybrid sunflower species is of considerable interest, and future work will examine this lineage in greater detail to determine whether transcriptional and/or transpositional activation can be detected in natural and/or greenhouse synthesized hybrids between the parental species H. annuus and H. petiolaris. It is noteworthy that elements within this sublineage also are fairly abundant in the parental species, indicating past amplification in the parental species as well. Based on limited sampling, however, these elements do not appear to be currently active transcriptionally in either the parental or hybrid taxa (RT-PCR data not shown), a result that lies in contrast to another study examining diversity and abundance of Ty3/gypsy-like elements in wild Iris species and their early generation hybrids .
Another potentially relevant factor in these proliferation events may be the demographic history of these sunflower hybrid species. Recent work  has demonstrated that several categories of transposable elements display differential patterns of distributional abundance and presumed activity among natural populations of Arabidopsis lyrata that have and have not experienced historical bottlenecks during postglacial recolonization into new geographical regions. Following arguments put forth previously [55, 56] the authors invoke weaker selection against transposable element activity in bottlenecked populations resulting from reductions in effective population sizes and the accompanying increased strength of genetic drift. It is conceivable that similar demographic forces may have acted in the Helianthus hybrid species given differences in habitat preferences between the hybrid and parental sunflower species and the founder event-like population structures that may have been associated with the hybrid species' origins.
While this study indicates clear patterns of retrotransposon proliferation events in the genomes of these sunflower hybrid species, some caveats need mention. First, it is unlikely that we have sampled the total Ty3/gypsy-like diversity in these Helianthus genomes. The sequence variability reported here is limited by the degeneracy of the primers employed. More comprehensive methods for uncovering the full range of retrotransposon subfamily diversity would require genome-level sequencing efforts. For example, by analyzing whole genome shotgun (WGS) libraries, Hawkins et al. identified three major subfamilies of Ty3/gypsy-like elements in Gossypium species, naming them Gorge1, 2, and 3. Similarly, utilizing available large sequence datasets for Oryza australiensis, Piegu et al. also characterized three major subfamilies of Ty3/gypsy-like elements. The sunflower Ty3/gypsy-like elements described in the current study are most closely related to the Gorge3 and Kangourou subfamilies of elements identified by Hawkins et al. and Piegu et al., respectively (Figure 6); this suggests that additional Ty3/gypsy diversity in Helianthus remains uncharacterized. A second caveat of this study, and related to the first, is that we have surveyed sequence variability of the more conserved reverse transcriptase domain-encoding region in the current study whereas our earlier report of proliferation in the hybrid species  was based on comparisons among parental and hybrid species of relative abundance (Southern blot) and absolute abundance (quantitative PCR) of the integrase domain-encoding region. Thus, while we assume we have documented proliferation of the same Ty3/gypsy-like subfamily in the current and earlier report, we cannot rule out the possibility that we have identified different subfamilies in these two studies and we currently lack the resolution to detect this possibility. This matter can be resolved through additional surveys of sequence variability and by isolating and sequencing the entire protein-coding interior regions for a diversity of these elements. A third caveat is that although the parental taxa that gave rise to the hybrid species are still extant, certain retrotransposon lineages could have been lost from the genomes of one or both parental species over evolutionary time; thus, the genomic composition of modern day H. annuus and H. petiolaris may differ in some fashion from that of the H. annuus and H. petiolaris individuals/populations that originally gave rise to the hybrid species.
Distributions of divergence values for sequences within the candidate proliferative source lineage E' revealed strong evidence of secondary features (peaks) associated with lower values of divergence (Figure 5a–e) in the hybrid species, with no evidence of such peaks in H. annuus and only limited evidence of such peaks in H. petiolaris. This pattern is exactly that predicted under a scenario of element derepression and proliferation in the diploid hybrid taxa following or associated with their origins. The specific sets of sequences that give rise to these secondary features appear to differ among the three hybrid taxa (data not shown), providing further evidence of independent proliferation events in the three hybrid species. Evidence of peaks associated with lower divergence values in H. petiolaris was weak and observed only under a very narrow range of binwidths in analyses with the program siZer . Nonetheless, we cannot rule out recent activity of a lesser scale in this parental species.
Ty3/gypsy-like proliferation events in the hybrid species' genomes offer a unique opportunity to explore the temporal origins of these species given that proliferation events occurring in the hybrid taxa place a lower bound on their birth. Using methods for dating the ages of paralogous sequences within genomes [50, 51, 57], these proliferation events in the hybrid species are estimated to have occurred between approximately 0.5 to 1 million years ago. These estimates suggest an earlier origin for the hybrid taxa than has been previously suggested based on microsatellite divergence data [58–60], but are largely consistent with more recent revised estimates based on EST sequence divergence data (L. Rieseberg, University of British Columbia, personal communication).
An outstanding question remains how, if at all, transposable element proliferation may have contributed to evolutionary events that took place in this group of sunflowers. Might this represent an example of new species arising via hybridization and concomitant 'reassortments of repetitious DNAs', as envisioned originally by McClintock ? Earlier views by other prominent researchers suggested that transposable elements likely represent 'selfish' or 'parasitic' DNA sequences only, and that a major role in evolutionary processes of the host need not be invoked in order to explain their existence [20, 21]. More recently, however, several researchers have argued that transposable elements may indeed play a more prominent role in generating evolutionary novelty with potential effects on adaptive evolutionary change [22, 23, 27, 28, 30, 62–65]. The vast amounts of sequence data now available for several species appear to be supportive of this more recent view, especially with regard to the LTR retrotransposons. For example, genomic sequence data for several model species including Mus musculus, Caenorhabditis elegans, and Drosophila melanogaster indicate that LTR retrotransposons sequences (and fragments thereof) show a higher than expected prevalence of association with certain categories of genes, and that novel gene configurations can arise via new exons or spliced additions to existing exons [24–26]. In addition to generating evolutionary novelty via new (or modified) protein sequences, recent studies also suggest that LTR retrotransposons as well as other types of transposable elements may influence regulatory aspects of host genes [31–33].
Notwithstanding any contribution to evolutionary novelty in this group, it is also intriguing to ponder how these sunflower genomes simply have accommodated massive genomic expansions given the highly mutagenic nature of such large-scale proliferation events . Interestingly, in phylogenetic analyses with other plant Ty3/gypsy-like retrotransposons, Surge elements group within the 'class B' elements (as described by Martin and Llorens . Elements in this group possess an additional domain (a chromodomain) in their interior coding region that is involved in site-directed integration of elements into heterochromatin . The spatial scale of proliferation in the hybrid species conforms with expectations of site-directed insertion, as fluorescence in situ hybridization (FISH) experiments reveal that the vast majority of retrotransposon proliferation in the sunflower hybrid taxa has occurred in pericentromeric regions of chromosomes . The amenable nature of this group of sunflowers to experimental study and the excellent genomic resources now emerging for Helianthus should greatly facilitate future work on the likely causes and evolutionary consequences of retrotransposon proliferation in these species.
Seeds of all species under investigation were obtained from the United States Department of Agriculture (USDA) National Plant Germplasm System (Table 1). Seeds were germinated in the dark on moist filter paper in Petri dishes and germinated seedlings were then transferred to 10 cm pots and grown in the Kansas State University greenhouses until suitable size for harvesting of plant tissue. DNA was extracted using a DNeasy Plant Mini Kit (Qiagen, Valencia, CA, USA) following the manufacturer's instructions.
Degenerate primers (forward, 5'-GGACCTGCTGGACAAGGGNTWYATHMG-3' and reverse, 5'-CAGGAAGCCCACCTCCCKNWRCCARAA-3') were developed and used to amplify a 520 bp fragment of the Ty3/gypsy-like rt domain-encoding region from a single plant of each species. Degenerate primers were developed with the web-based program CODEHOP  and were based on aligned Ty3/gypsy-like reverse transcriptase (rt) amino acid sequences from sunflower  and the following additional elements: Dea1, Ananas; Del1, Lilium; and Legolas, Arabidopsis. PCR was performed on an MJ Research (Watertown, MA, USA) PTC-100 thermal cycler under the following conditions: 3 min at 94°C, followed by 31 cycles of 30 s at 94°C, 30 s at 45°C, and 60 s at 72°C, and a final extension of 3 min at 72°C. Individual PCR reactions each contained 5 ng DNA, 50 pmol of each primer, 1 unit of Taq polymerase, and a final concentration of 30 mM tricine, 50 mM KCl, 2 mM MgCl2, and 100 μM of each dinucleotide triphosphate (dNTP). For each sunflower species, five individual 25 μl PCR reactions were performed and pooled for further processing in order to reduce potential effects of PCR drift . Pooled products were gel purified using a QIAquick Gel Extraction Kit (Qiagen, Valencia, CA, USA) and cloned using the pGEM-T Vector System I (Promega, Madison, WI, USA). For each species, between 96 and 109 positive clones were sequenced using the M13 universal sequencing primer on an ABI 3730xl Genetic Analyzer. In a small number of instances two or more sequences obtained from the same species demonstrated 100% identity. Because of the inability to distinguish between a single amplicon variant having been cloned more than two times versus independent amplification of identical elements with different chromosomal insertion points, only a single representative of identical sequences was retained so as not to bias interpretation in subsequent analyses (see Results). Thus, for each species, between 92 and 108 unique sequences were retained for further analysis (Table 1). Sequence alignments were conducted with ClustalW  with subsequent manual adjustments. Phylogenetic analyses were conducted in PAUP* v.4.0b10  using the Kimura two-parameter model of sequence evolution. Sequences used in this study have been deposited in GenBank under accession numbers GQ366796–GQ367295.
Evidence of recent retrotransposon proliferation events in the hybrid species was evaluated by examining distributions of pairwise divergence values for sequences derived from a candidate proliferative source lineage as suggested by our phylogenetic analyses. Peaks in the frequency distribution are interpreted as episodic events of transposition, with peaks associated with lower values of divergence representing more recent proliferation events. Pairwise divergence values among sequences were determined using MEGA version 4  under the Kimura two-parameter model of sequence evolution. In order to identify significant features (peaks) in these distributions, we utilized the program siZer . This program evaluates data over a wide range of binwidths and determines statistical support for features (peaks) in a distribution based upon regions of significant slope increase and decrease.
We thank Carolyn Ferguson, Takeshi Kawakami, Ying Zhen, Susan Rolfsmeier, Ziyi Wang, Madhav Nepal, and Loren Rieseberg for comments on an earlier draft. This work was supported by Kansas State University and National Science Foundation grant DEB-0742993 to MCU.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.