The complete chloroplast DNA sequences of the charophycean green algae Staurastrum and Zygnema reveal that the chloroplast genome underwent extensive changes during the evolution of the Zygnematales

Background The Streptophyta comprise all land plants and six monophyletic groups of charophycean green algae. Phylogenetic analyses of four genes from three cellular compartments support the following branching order for these algal lineages: Mesostigmatales, Chlorokybales, Klebsormidiales, Zygnematales, Coleochaetales and Charales, with the last lineage being sister to land plants. Comparative analyses of the Mesostigma viride (Mesostigmatales) and land plant chloroplast genome sequences revealed that this genome experienced many gene losses, intron insertions and gene rearrangements during the evolution of charophyceans. On the other hand, the chloroplast genome of Chaetosphaeridium globosum (Coleochaetales) is highly similar to its land plant counterparts in terms of gene content, intron composition and gene order, indicating that most of the features characteristic of land plant chloroplast DNA (cpDNA) were acquired from charophycean green algae. To gain further insight into when the highly conservative pattern displayed by land plant cpDNAs originated in the Streptophyta, we have determined the cpDNA sequences of the distantly related zygnematalean algae Staurastrum punctulatum and Zygnema circumcarinatum. Results The 157,089 bp Staurastrum and 165,372 bp Zygnema cpDNAs encode 121 and 125 genes, respectively. Although both cpDNAs lack an rRNA-encoding inverted repeat (IR), they are substantially larger than Chaetosphaeridium and land plant cpDNAs. This increased size is explained by the expansion of intergenic spacers and introns. The Staurastrum and Zygnema genomes differ extensively from one another and from their streptophyte counterparts at the level of gene order, with the Staurastrum genome more closely resembling its land plant counterparts than does Zygnema cpDNA. Many intergenic regions in Zygnema cpDNA harbor tandem repeats. The introns in both Staurastrum (8 introns) and Zygnema (13 introns) cpDNAs represent subsets of those found in land plant cpDNAs. They represent 16 distinct insertion sites, only five of which are shared by the two zygnematalean genomes. Three of these insertions sites have not been identified in Chaetosphaeridium cpDNA. Conclusion The chloroplast genome experienced substantial changes in overall structure, gene order, and intron content during the evolution of the Zygnematales. Most of the features considered earlier as typical of land plant cpDNAs probably originated before the emergence of the Zygnematales and Coleochaetales.


Background
About 450 million years ago, green algae belonging to the class Charophyceae emerged from their aquatic habitat to colonize the land [1][2][3]. This important event in the history of life gave rise to all the land plant species that make up the flora of our planet. The few thousand species of charophycean green algae that are alive today exhibit great variability in cellular organization and reproduction [4]. With the land plants, they form the green plant lineage Streptophyta [5], whereas all other green algae (more than 10,000 species), with perhaps the exception of Mesostigma viride, belong to the sister lineage Chlorophyta [4]. Five monophyletic groups of charophycean green algae have been recognized: the Chlorokybales, Klebsormidiales, Zygnematales, Coleochaetales and Charales [6], given here in order of increasing cellular complexity. Mesostigma may represent an additional lineage of the Charophyceae, the Mesostigmatales, as indicated by phylogenetic studies that placed this unicellular green alga at the base of the Streptophyta [7][8][9][10]. This lineage, however, remains controversial, considering that separate analyses based on a large number of chloroplast-or mitochondrial-encoded proteins [11][12][13] and on the chloroplast small and large subunit rRNA genes [14] identified Mesostigma before the divergence of the Chlorophyta and Streptophyta.
On the basis of morphological characters alone, the two charophycean groups that exhibit the greatest cellular complexity, i.e. the Charales and Coleochaetales, have been proposed to be the closest relatives of land plants [15,16]. Recent analyses of the combined sequences of four genes from the nucleus (small subunit rRNA gene), chloroplast (atpB and rbcL) and mitochondria (nad5) of 25 charophycean green algae and eight green plants revealed that the Charales and land plants form a highly supported clade; however, moderate bootstrap support was observed for the positions of the other charophycean groups [8]. The best trees inferred by Bayesian and maximum likelihood methods in this four-gene analysis support an evolutionary trend toward increasing cellular complexity [17]. In contrast, all phylogenies of charophycean green algae previously inferred from a smaller number of genes failed to provide any conclusive results concerning the branching order of the charophycean green algae and their relationships with land plants [15,16].
We have recently undertaken the sequencing of complete chloroplast genomes from representatives of the various charophycean lineages in order to elucidate the branching order of these lineages and also to understand the evolution of chloroplast DNA (cpDNA) within the Streptophyta. We have reported thus far the cpDNA sequences of Mesostigma (Mesostigmatales) [11] and Chaetosphaeridium globosum (Coleochaetales) [18]. Comparative analyses of the Mesostigma cpDNA sequence (136 genes, no introns) with its land plant counterparts (110-120 genes, about 20 introns) revealed that the chloroplast genome underwent substantial changes in its architecture during the evolution of streptophytes (namely gene losses, intron insertions and scrambling of gene order). At the levels of gene content (125 genes), intron composition (18 introns) and gene order, Chaetosphaeridium cpDNA is remarkably similar to land plant cpDNAs, implying that most of the features characteristic of land plant lineages were acquired from charophycean green algae. Like the cpDNAs of many chlorophytes, those of Mesostigma, Chaetosphaeridium and most land plant species exhibit a quadripartite structure that is characterized by the presence of two copies of a rDNA-containing inverted repeat (IR) separated by large and small single-copy regions. All the genes they have in common, with a few exceptions, reside in corresponding genomic regions.
In this study, we report the complete cpDNA sequences of two members of the Zygnematales that belong to distinct lineages, Staurastrum punctulatum and Zygnema circumcarinatum. Although the chloroplast genomes of these charophycean green algae closely resemble their Chaetosphaeridium and bryophyte counterparts at the primary sequence and gene content levels, they feature substantial differences at the levels of structure, gene order and intron content. Like the cpDNA of the zygnematalean alga Spirogyra maxima [19], both Staurastrum and Zygnema cpDNAs lack a large IR. Clearly, loss of the IR appears to be a major event that shaped the architecture of the chloroplast genome in the Zygnematales, an event that apparently occurred early during the evolution of this group of charophycean green algae.

Selection of taxa
The Zygnematales as circumscribed by Bold and Wynne [20] comprise the green algae whose mode of sexual reproduction is conjugation. This is the most important charophycean lineage in terms of diversity and number of species (~50 genera and ~6,000 species) [16]. Classification schemes based on cell wall organization have recognized two groups of conjugating green algae: first, the unicellular or multicellular green algae with an ornamented and segmented cell wall, also called placoderm desmids and often treated as members of the order Desmidiales, and second, the green algae that bear a smooth cell wall, which are often classified separately in the Zygnematales [21]. Among the latter group are found filamentous forms and the saccoderm desmids that consist either of unicells or loosely joined cells. Phylogenies inferred using rbcL [21] or the combined rbcL and nuclear small subunit rRNA genes [22] support the monophyly of placoderm desmids and place the filamentous and saccoderm desmids together in a distinct monophyletic group. For our study, we have selected a representative of each of these two monophyletic groups: Staurastrum is a unicellular, placoderm desmid, whereas Zygnema is a filamentous green alga with a non-ornamented cell wall.

General features
The 157,089-bp Staurastrum [GenBank:AY958085] and 165,372-bp Zygnema [GenBank:AY958086] cpDNAs map as circular molecules containing 121 and 125 genes, respectively (Fig. 1). Both genomes lack a rDNA-containing IR and no remnant of such a sequence could be detected during our analysis of repeated elements. All genes are present in single copy, with the exception of the duplicated Zygnema trnE(uuc) gene, the sequences of which differ at two positions. Note that the matK gene was not included in the total number of genes calculated for Zygnema cpDNA, because this gene occurs as an intron ORF in all other streptophytes where it has been identified. Aside from the absence of the IR, the most prominent differences displayed by the two zygnematalean cpDNAs relative to their counterparts in Chaetosphaeridium [18] and land plants (here represented by the bryophyte Marchantia polymorpha [23]) are their larger size (taking into consideration the absence of the IR from these genomes) and their smaller number of cis-spliced group II introns ( Table 1). The larger size of zygnematalean cpD-NAs is mainly explained by the expansion of intergenic spacers ( Table 2). The latter sequences represent 42% of the genome in both Staurastrum and Zygnema cpDNAs compared to about 20% in Chaetosphaeridium and land plant cpDNAs. Introns have also expanded in size in both zygnematalean cpDNAs compared to their Chaetosphaeridium and land plant homologues (Table 2). Table 3 compares the gene contents of Staurastrum, Zygnema, Chaetosphaeridium and Marchantia cpDNAs. The two zygnematalean cpDNAs share 120 genes, 116 of which are present in both Chaetosphaeridium and Marchantia cpDNAs. Five genes in Zygnema cpDNA are missing from Staurastrum cpDNA; they encode the tRNA Pro (GGG), tRNA Ser (CGA), ribosomal protein L5, and the proteins CysA and CysT that are involved in sulfate transport. Although there is no functional trnS(cga) in Staurastrum cpDNA, a trnS(cga) pseudogene was identified in this genome. A standard acceptor stem could not be modelled from the RNA sequence derived from this pseudogene; the 5' region of this sequence diverges considerably from homologous tRNA sequences in other streptophytes and cannot base pair with the 3' region. Staurastrum exhibits only one chloroplast gene (rpl22) that is missing from Zygnema. To our knowledge, this is the first time that the loss of rpl22 together with that of rpl32 (a gene absent from both zygnematalean cpDNAs) has been reported in the Streptophyta. As in land plant cpDNAs, but in contrast to Chaetosphaeridium cpDNA, no tufA-like sequence was detected in the two zygnematalean cpDNAs. It appears that only the chlI, odpB and ycf62 genes were specifically lost just before or concurrently with the emergence of land plants ( Table 3). Note that the rps16 gene cannot be included in this category, as it is present in the majority of land plant cpDNAs sequenced to date.

Gene order
Staurastrum and Zygnema cpDNAs differ substantially from one another and from their Chaetosphaeridium and land plant counterparts at the level of gene organization (Table 4). Eighty-two genes in the two zygnematalean cpDNAs form 22 blocks of colinear sequences, which are highly scrambled in order (Fig. 1). A minimum of 59 inversions would be required to convert the gene order of Staurastrum cpDNA into that of Zygnema cpDNA (Table  4).
Of the two zygnematalean cpDNAs, that showing the most similar gene arrangement with its Chaetosphaeridium and land plant counterparts is Staurastrum cpDNA ( Table  4). In both Staurastrum and Zygnema cpDNAs, the gene organization more closely resembles that of Marchantia than that of Chaetosphaeridium (Table 4). Staurastrum cpDNA shares with its Marchantia counterpart 22 blocks of colinear sequences that contain a total of 101 genes, whereas Zygnema cpDNA shares 20 blocks featuring 81 genes (Fig. 1). Close inspection of these blocks relative to those conserved between Mesostigma and Marchantia cpD-NAs [11] reveals that 13 ancestral gene clusters, including those containing the rDNA, atpA, psbB and rpoB operons, were fragmented at 27 sites during the evolution of the Zygnematales (Fig. 2). Eleven of these rearrangement breakpoints are common to the two green algal cpDNAs, whereas 2 and 14 breakpoints are unique to Staurastrum and Zygnema cpDNAs, respectively. Assuming that these unique rearrangement breakpoints appeared after the divergence of the two zygnematalean species, we infer that the chloroplast genome of the common ancestor of Staurastrum and Zygnema shared a number of derived gene clusters with Chaetosphaeridium and land plants. For example, the cluster of 29 genes extending from petL to trnI(cau) in Marchantia cpDNA and that of 13 genes delimited by rps12b and atpI were likely present in the common ancestor of Staurastrum and Zygnema. Only four gene clusters are shared specifically between zygnematalean and Marchantia cpDNAs: rps4-trnS(gga)-ycf3 (cluster 9 in Fig. 1), atpB-atpE-trnV(uac)-trnMe(cau)-ndhC-ndhK-ndhJ (cluster 15), trnH(gug)-ftsH-trnD(guc) (in Staurastrum only), and trnE(uuc)-cysA-trnT(ggu) (in Zygnema only). The higher degree of ancestral characters displayed by Staurastrum cpDNA compared to its Zygnema homologue at the gene organizational level is also evident when one examines the genomic region in which each gene locus would be expected to map if the IR had been retained (Fig.  3). In Staurastrum cpDNA, the 15 genes predicted to have been present in the small single-copy region occupy a discrete region just beside five of the eight genes that usually make up the IR; in Zygnema cpDNA, however, the genes usually located in the small single-copy region and the IR are more widely dispersed in the genome.

Intron composition
As in Chaetosphaeridium cpDNA, the introns in Staurastrum and Zygnema cpDNAs represent subsets of those found in land plant cpDNAs (Fig. 4). Both zygnematalean cpDNAs share with their Chaetosphaeridium and land plant counterparts one group I intron in trnL(uaa), two cis-spliced a Because Staurastrum and Zygnema cpDNAs lack an IR, only the genome size is given for each of these cpDNAs. SSC, small single-copy region; LSC, large single-copy region. b Unique ORFs, intron ORFs and pseudogenes were not taken into account. Note that Chaetosphaeridium tufA was considered to be a functional gene. Like its Chaetosphaeridium and land plant counterparts, the cis-spliced group II intron in Staurastrum trnK(uuu) encodes the maturase MatK. As mentioned earlier, a freestanding matK gene was identified in Zygnema cpDNA even though an intron is absent from trnK(uuu) in this charophycean green alga. Close inspection of the regions immediately flanking the Zygnema matK gene for the presence of sequences conserved in domains V and VI of group II introns failed to reveal any evidence that this gene had once been an integral part of a group II intron. The Zygnema matK is most probably a functional gene because its predicted protein features the vast majority of the conserved amino acids that the trnK intron-encoded MatK of Staurastrum shares with its Chaetosphaeridium, Chara, Nitella and land plant homologues (Fig. 5).

Repeated sequences
Comparison of each zygnematalean cpDNA sequence against itself using PipMaker [24] indicated the presence of repeats in many intergenic regions of Zygnema cpDNA and the virtual absence of such sequences from Staurastrum cpDNA. Analysis of the Zygnema genome sequence with REPuter [25] revealed that the great majority of the repeat regions larger than 30 bp are composed of short tandem repeats. Each of the 35 repeat regions identified consists of 4 to 16 bp units that are repeated in tandem 4 to 50 times (Table 5). Most regions (29/35) feature repeat Only the conserved genes that are missing in one or more chloroplast genomes are indicated. Plus and minus signs denote the presence and absence of genes, respectively. b Pseudogenes. c Chaetosphaeridium tufA could be a pseudogene because its sequence is highly divergent from those of other green plants. Only two loci of the Staurastrum chloroplast genome contain short tandem repeats: a region composed of four units of the GAATAAATA sequence in the infA-rpl36 spacer and a region containing nine units of the GTATTT sequence in the rps16-odpB spacer. Aside from two copies of 45-bp sequence (in the atpF-atpH and atpH-rps14 spacers) that are in direct orientation, no dispersed repeats larger than 30 bp were detected in Staurastrum cpDNA.

Discussion
Although Staurastrum and Zygnema cpDNAs bear high similarity in primary sequence and gene content to their Chaetosphaeridium and land plant counterparts, they differ substantially from one another and from the latter genomes in overall structure, gene order and intron content. From our comparative analysis of streptophyte cpD-NAs, we infer that the chloroplast genome of the last common ancestor of Staurastrum and Zygnema probably lacked a large IR encoding the rRNA genes, had a low gene density, and more closely resembled Chaetosphaeridium and land plant cpDNAs at the gene organizational and intron levels than do Zygnema and Staurastrum cpDNAs. been identified in Chaetosphaeridium, were probably present in the common ancestor of Staurastrum and Zygnema.
Considering the absence of an rDNA-encoding IR region in both Staurastrum and Zygnema cpDNAs, it is not surprising that these genomes are considerably rearranged relative to their coleochaetalean and land plants counterparts that have retained the quadripartite structure. All green plant cpDNAs that have lost the IR tend to be highly scrambled in gene order [26,27]. It has been hypothesized that the loss of the IR enhances opportunities for intramolecular recombination between small dispersed repeats [28]. In agreement with the idea that there is a direct link between the frequency of intramolecular recombination events and the abundance of small dispersed repeats [28], we identified more rearrangements in the repeat-rich genome of Zygnema than in the repeat-poor genome of Staurastrum. As in the cpDNAs of the nonphotosynthetic, parasitic flowering plant Epifagus virginiana [29] and the evening primrose Oenothera [30], the repeated sequences in Zygnema cpDNA consist essentially of tandem repeats that probably arose by replication slippage.
A single event of IR loss likely accounts for the absence of a quadripartite structure from both Staurastrum and Zygnema cpDNAs. This hypothesis is more parsimonious than the alternative scenario involving two independent losses, and is consistent with previous evidence that the cpDNA of Spirogyra (a distant relative of Zygnema) has no IR [19]. It is also supported by our finding that Staurastrum and Zygnema cpDNAs share 11 rearrangement breakpoints within ancestral gene clusters. Given the close connection between IR loss and gene rearrangements, several of these shared breakpoints might have appeared following the loss of the IR in the lineage leading to the last common ancestor of Staurastrum and Zygnema. Considering that this ancestor occupies a basal position in the tree describing the relationships among zygnematalean green algae [21,22], then most, if not all, of the algae belonging to the Zygnematales are expected to lack an IR in their chloroplast genome.
As introns appear to be generally stable in land plant cpD-NAs [28], the important difference in intron content displayed by Staurastrum and Zygnema cpDNAs is unexpected. The two zygnematalean cpDNAs share only five of the 16 intron insertion sites they exhibit in total.Staurastrum cpDNA lacks seven of the 13 introns that are present in Zygnema cpDNA, whereas the latter cpDNA lacks five of the eight introns found in the former genome. The intron distributions in these cpDNAs are best explained by assuming that all 16 insertion sites were populated with introns in the common ancestor of Staurastrum and Zygnema and that subsequently, several  introns were specifically lost in each of the lineages leading to these green algae. Obviously, we cannot exclude the possibility that chloroplast introns occupying common insertion sites were lost independently in the Staurastrum and Zygnema lineages; thus, the predicted number of introns in the common ancestor of these algae may represent a minimal estimate. Given that intron losses are thought to result from insertions, through homologous recombination, of intron-less cDNA copies generated by reverse transcription [31], the frequency of Sequence conservation among streptophyte MatK proteins homologous recombination events or the level of reverse transcriptase activity might be higher in the chloroplasts of conjugating green algae than in land plant chloroplasts. In this respect, it is interesting to note that the Staurastrum trans-spliced rps12 intron specifies a reverse transcriptase and is the only known streptophyte chloroplast intron encoding such an activity.

Distributions of introns in streptophyte cpDNAs
Our finding that matK is free-standing in Zygnema cpDNA together with the absence of the trnK(uuu) intron in which it usually resides strongly suggests that its putative maturase product is essential for the splicing of group II introns other than the trnK(uuu) intron. Circumstantial evidence that MatK functions in splicing of multiple introns has previously been reported for land plant chloroplasts. The matK gene is located within the group II intron of trnK(uuu) in all photosynthetic land plants, but occurs as a free-standing gene in Epifagus cpDNA [29]. In vivo splicing analyses of the complete set of chloroplast group II introns in land plant mutants lacking chloroplast ribosomes disclosed specific splicing defects involving mainly group IIA introns (in atpF, rpl2, rps12, trnA, trnI, trnK), thus implying that cpDNA-encoded protein(s) act as splicing factors [32][33][34][35]. It has been proposed that MatK evolved from a trnK(uuu) intron-specific maturase to a more versatile maturase that assists the splicing of most or all group IIA introns of land plants [32][33][34][35].
originated before the emergence of the Coleochaetales and Zygnematales. While the chloroplast genome appears to have remained relatively stable in the coleochaetalean lineage, it has lost the IR and has undergone many changes in gene order and intron content during the evolution of the Zygnematales.