Gene gain and loss from the Asian corn borer W chromosome

Background Sex-limited chromosomes Y and W share some characteristics, including the degeneration of protein-coding genes, enrichment of repetitive elements, and heterochromatin. However, although many studies have suggested that Y chromosomes retain genes related to male function, far less is known about W chromosomes and whether they retain genes related to female-specific function. Results Here, we built a chromosome-level genome assembly of the Asian corn borer, Ostrinia furnacalis Guenée (Lepidoptera: Crambidae, Pyraloidea), an economically important pest in corn, from a female, including both the Z and W chromosome. Despite deep conservation of the Z chromosome across Lepidoptera, our chromosome-level W assembly reveals little conservation with available W chromosome sequence in related species or with the Z chromosome, consistent with a non-canonical origin of the W chromosome. The W chromosome has accumulated significant repetitive elements and experienced rapid gene gain from the remainder of the genome, with most genes exhibiting pseudogenization after duplication to the W. The genes that retain significant expression are largely enriched for functions in DNA recombination, the nucleosome, chromatin, and DNA binding, likely related to meiotic and mitotic processes within the female gonad. Conclusions Overall, our chromosome-level genome assembly supports the non-canonical origin of the W chromosome in O. furnacalis, which experienced rapid gene gain and loss, with the retention of genes related to female-specific function. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-024-01902-4.

with Y chromosomes, namely an abundance of repetitive elements [10,11].However, some evidence from birds suggests that W chromosomes may differ from Y chromosomes in that they lack genes with female-specific functions [3,12].Whether this is unique to birds, or is a broader pattern of W chromosomes, remains unclear.
Lepidoptera, butterflies and moths, provide a key additional female heterogametic system [13,14].The conservation of the Z chromosome has been well established in Lepidoptera [15]; however, the W chromosome in Lepidoptera is unusual in that it was recruited into the genome well after the origin of the Z chromosome [16], as the basal lineages in the clade are Z0/ZZ.Available evidence suggests that, at least in some lineages, the W chromosome bears no homology to the Z [11,15,17] and may actually have originated from a B chromosome [15].Within the Lepidoptera, complex sex chromosomes including neo-W chromosomes are observed in many clades based on cytogenetic analysis [16].
To answer questions about the conservation and gene content of the lepidopteran W, we built a chromosomelevel genome of the Asian corn borer O. furnacalis Guenée (Lepidoptera: Crambidae, Pyraloidea), a major insect pest of corn, widespread in the Asian-Western Pacific region, using a combination of PacBio HiFi circular consensus sequencing (CCS) mode and highthroughput chromosome conformation capture (Hi-C) sequencing.We couple this with extensive RNA-Seq analysis of multiple developmental stages and tissues in both sexes.Our catalog of W gene content reveals extensive duplications from all other chromosomes in the genome, with a relatively low percent of genes with persistent gene activity, which are enriched for functions in DNA recombination, the nucleosome, chromatin, and DNA binding.Our results suggest that the W chromosome retains gene related to meiotic and mitotic functions within the female gonad, in contrast to the avian W chromosome [3,12] and more consistent with findings from Y chromosomes [5][6][7][8].

Chromosome-level genome assembly of the Asian corn borer
Our chromosome assembly strategy employed PacBio HiFi (CCS) sequencing data to assemble the draft genome and Hi-C data to detect chromosomal contact information.These PacBio long-reads were self-corrected using Quiver and assembled into a draft genome assembly with a total length of 493.10 Mb, consisting of 57 contigs with an N50 length of 15.69 Mb (Additional File 1: Table S1).The assembled genome size is similar to that obtained from genome surveys (Additional File 1: Figure S1).Next, Hi-C linking information further anchored, ordered, and oriented 46 contigs to 32 chromosomes (30 autosomes, with Z and W sex chromosomes), which contained 86.79% of the whole genome assembly (Additional File 1: Table S1, Figure S2).The chromosome-level genome assembly was 492.57Mb with a scaffold N50 of 16.47 Mb (Additional File 1: Table S1).
We evaluated the completeness of O. furnacalis genome assembly with the Benchmark of Universal Single-Copy Orthologs (BUSCO v5) from the lepidop-tera_odb10 set.Our assembly contained 98.7% of complete BUSCO genes, of which 98.5% were single copy and 0.2% were duplicates (Additional File 1: Table S2).To further evaluate the genome assembly quality, genomic DNA sequencing data from Illumina HiSeq were mapped to the assembly scaffolds, with a 99.07%coverage rate (Additional File 1: Table S2).Finally, lepidopteran genomes typically exhibit very high levels of synteny.Whole-genome alignment of the O. furnacalis assembly to the chromosomes of the S. litura revealed that chromosomal linkage and ordering of genes are highly conserved (Additional File 1: Figure S3).All these analyses indicated that the genome assembly is both highly reliable and complete for subsequent analyses.

Genome annotation
In total, 860,391 repeat sequences spanning 204.4 Mb were identified, constituting 41.45% of the O. furnacalis genome (Additional File 1: Table S1, Fig. 1A).We integrated the result of ab initio, homology-based, and RNA-seq methods to annotate protein-coding genes, most of which came from homology-based and RNA-seq methods, indicating the high-quality of annotation (Additional File 1: Figure S4).Finally, we obtained 16,509 protein coding genes in the O. furnacalis genome, which has similar gene features to other lepidopteran genomes (gene length, gene number, coding length, intron length of each gene, exon length) (Additional File 1: Figure S4).Next, we identified different types of noncoding RNAs (ncRNAs), including 7710 transfer RNAs (tRNA), 73 ribosomal RNAs (rRNA), and 39 microRNAs (miRNA) (Additional File 1: Table S3).In addition, we annotated 167 pseudogenes in the O. furnacalis genome, defined as any genomic sequence that is similar to another gene but is defective, such as containing a premature stop codon or a frameshifts mutation [24].Most pseudogenes were located in the chromosome LG3 (Fig 1C, Additional File 1: Table S3), the W chromosome (see below).We performed a functional annotation for all predicted protein coding genes using NCBI nonredundant, Egg-NOG, GO, KEGG, SWISS-PROT, and Pfam databases, with 98.21% of our predicted genes assigned to at least one of these databases (Additional File 1: Table S3).
We used OrthoFinder to find orthologous genes among O. furnacalis and ten other insect species (see " Methods").A total of 16,364 orthologous groups with 857 single-copy orthologous genes were identified.We inferred a phylogenetic tree and divergence time estimate using the end-to-end concatenated amino acid sequences of 857 single-copy orthologous genes.Divergence time estimation indicated that the Crambidae lineage which O. furnacalis belongs to arose approximately 68.49Mya ago (Fig. 2A).

Synteny, karyotype evolution, and sex chromosomes
We compared the syntenic relationships between O. furnacalis and other lepidopterans, including Bombyx mori, C. medinalis, C. pomonella, S. exigua, S. fugiperda, and T. ni (Fig. 2B-G).In general, the O. furnacalis genome shows a high level of synteny with other lepidopteran genomes, though with some fusion and fission events.O. furnacalis LG1 is syntenic with the Z chromosome in all other species, suggesting it is the Z chromosome in O. furnacalis as well, and this is also evident from M:F coverage analysis (Fig. 3A), which reveals twofold greater coverage in males (ZZ) compared to females (ZW).
O. furnacalis LG3, the W chromosome, exhibits female-biased coverage consistent with a female-limited chromosome (Fig. 3A).The M:F coverage is highly variable across chromosome LG3 compared to all other chromosomes (Fig. 3B), due in large part to the abundance of TEs on this chromosome (Additional File 1: Figure S5-S6).The W chromosome (LG3) is the second largest chromosome (22.23 Mb) in our assembly (Fig. 1A), with the largest number of predicted protein coding genes (excluding pseudogenes) compared with other chromosomes (Fig. 1B).The W also has the largest number of pseudogenes (Fig. 1C) and contains 43.1% of all pseudogenes annotated in the genome.Many genes that are not technically pseudogenized were expressed below our 0.5 FPKM threshold.Only 66 protein coding genes showed significant expression above this threshold in our mixed sample (comprised of eggs, larvae, pupae, and adults, see "Methods, " Furthermore, LG3 has a notably greater repeat density and significantly different repeat composition compared to other chromosomes (Additional File 1: Figure S5-S6) with a larger proportion of satellites, DIRS, LINE, Copia, and Gypsy, and an enrichment of Maverick elements.

W homology
Given that previous reports have found no evidence of homology between the Z and W in some lepidopteran species [11,17,22], we next investigated the evolutionary history of the O. furnacalis W chromosome.Using our W gene set, we first examined homology between the O. furnacalis W chromosome and the W chromosome from other lepidopterans.The O. furnacalis W shows substantial homology with the W in C. medinalis, consistent with some form of shared ancestry within the family Crambidae.However, we observed no discernable homology between the O. furnacalis W and the W chromosome of any of the more distantly related lepidopterans that we queried (Fig. 2B-G).
Reciprocal best BLAST hits between the W coding catalog and the remainder of the O. furnacalis genome reveals similarity throughout the genome (Fig. 4), consistent with a B chromosome origin of the W chromosome followed by gene duplication from locations throughout the genome.Of the 482 W coding genes having Z/autosome paralogs (Fig. 4), 202 exhibited fewer or no introns in the W paralogs, consistent with retrotransposition.
We next examined synonymous divergence (dS) for these paralogs in order to determine the timing of gene duplication to the W chromosome.The density of dS (Fig. 5A) shows that most paralogs with W genes have relatively low divergence, and the distribution of paralog dS (Fig. 5B) across the W chromosome suggests that duplications occur randomly throughout the chromosome.Given the low percent of genes on the W chromosome that exhibit expression >0.5 FPKM (Fig. 1C) and the high number of pseudogenes on the W (Fig. 1D), it is likely that most duplicates to the W chromosome fail to maintain significant expression and are silenced relatively quickly.
Sex-limited chromosomes often contain many copies of some genes [7,25], and so we examined copy number for genes on the W chromosome and their most similar paralog (Fig. 6, Additional File 1: Table S4).Overall, we observed lower copy number of W genes than that of their autosome/Z chromosome paralogs.Given the rapid apparent decay of W chromosome paralogs (Fig. 5), we examined the Gene Ontology enrichment for W expressed genes (>0.5 FPKM) (Tables 1 and  2), and observed statistical enrichment of terms related to functions in DNA recombination, the nucleosome, chromatin, and DNA binding.Together, these results suggest that many genes retained with functional expression on the W chromosome are related to mitotic and meiotic processes, much of it within the female gonad.

Discussion
Here we report a high quality, chromosome-level genome for O. furnacalis.Our reference genome includes a single, contiguous sequence for the female-specific W chromosome, allowing us to query the content of this unique region of the genome.

Sequence characteristics of W chromosome
W and Y chromosomes are often enriched for pseudogenes, as the lack of recombination in these regions leads to high rates of gene silencing [11].At the same time, these regions often accumulate repeat sequences [10,11,26], resulting in significantly higher repeat density compared to other chromosomes [12,26].Consistent with this, although we found that the number of annotated protein-coding genes of W chromosome is indeed the largest of all the chromosomes in the genome (Fig. 1B), as well as the number of pseudogenes (Fig. 1), and those genes that retain functional coding sequence have low overall transcriptional activity (Fig. 1D) [11,14].We also found the W is enriched for repetitive elements in O. furnacalis (Additional File 1: Figure S5 and S6), with the number of Maverick elements particularly higher on the W chromosome compared to the rest of the genome (Additional File 1: Figure S5).

Conservation of sex chromosomes across Lepidoptera species
We searched orthologs to Z-linked genes of O. furnacalis in other lepidopterans and found that although the Z chromosome shows clear strong conservation [16] (Fig. 2), the O. furnacalis W chromosome is only conserved with C. medinalis, also a member of Crambidae (Fig. 2B-G).This is consistent with previous work suggesting that the composition of the W varies even among species within the same family [27].W chromosome evolution can be rapid and consistent with this the W chromosomes between two Pieris sister species exhibit substantial divergence [10].Our data also support the rapid evolution W chromosome.Although O. furnacalis has some W-W homologs with C. medinalis, the W chromosomes have diverged at a dramatically higher pace than the autosomes and the Z chromosome (Fig. 2C).

Evolutionary history of the sex chromosomes in O. furnacalis
Karyotype work has revealed a complex history of the W chromosome, including the repeated origin of neo-W chromosomes in many lepidopteran lineages.The evolutionary history of the W chromosome in Lepidoptera differs from the canonical model of sex chromosome formation in that it was recruited, possibly from a B element, after the formation of the Z chromosome [16], at the common root of Ditrysia and Tischerioidea [14,28].Lineages ancestral to this recruitment exhibit Z0/ZZ karyotypes [16].The B-chromosome origin of the W is supported by the fact that in some lepidopteran lineages, W chromosome bears no homology to the Z [11,15,17].Some have recently argued that the rapid evolution of the W makes it difficult to differentiate between canonical (Z-homology) and non-canonical (B chromosome) origins of the W chromosome if that origin is deep in the past [10].The definitive test of these alternatives, direct evidence of B chromosome recruitment to a W chromosome, is difficult to envisage, as it would require both stability of the ancestral B chromosome in an outgroup lineage, unlikely given the instability of B chromosomes in general, as well as widespread homology of the derived W chromosome to the ancestral B. Without this direct evidence, we must rely on indirect evidence from Z-W homology.
We observe little similarity in gene content between the W and Z chromosome (Fig. 4), a steady rate of gene duplication to the W chromosome from throughout the genome (Fig. 5), and reduced intron number for many W paralogs compared to their most similar autosomal/ Z-linked copy suggests that this likely often occurs through retrotransposition across the W chromosome.We note that the reduction in introns of W genes compared to their Z homologs is largely consistent with a B chromosome origin of the W. Genes retrotransposed or otherwise duplicated to the W chromosome will immediately experience the full degenerative effects of a nonrecombining region [1,29], and consistent with this, we observe high levels of pseudogenization (Fig. 1) on the W chromosome.However, those genes on the W that retain functional coding sequence and expression (e.g., non-pseudogenized) are enriched for mitotic and meiotic functions and display Gene Ontology term enrichments related to recombination, chromosome packaging, and replication (Tables 1 and 2).It has been suggested that W chromosomes may differ from Y chromosomes in that they do not acquire genes related to female gonadal function [3,12] in the same way that Y chromosomes retain genes related to male function [5][6][7][8]; however, our results suggest that this may not be a generalized pattern of W chromosomes.Indeed, the W chromosome in O. furnacalis are enriched for functions related to female meiosis and mitosis, possibly due to TE activity.

Conclusions
Our study presents a chromosome-level genome from a Lepidipteran, O. furnacalis.Comparative analysis reveals the deep conservation of the Z chromosome across Lepidoptera, but little conservation of W chromosome in related species or with the Z chromosome, which supports the non-canonical origin of the W chromosome. O. furnacalis W chromosome has accumulated significant repetitive elements and experienced rapid gene gain from the remainder of the genome, with most genes exhibiting pseudogenization after duplication to the W. The genes that retain significant expression are largely enriched for functions in DNA recombination, the nucleosome, chromatin, and DNA binding, likely related to meiotic and mitotic processes within the female gonad.

Samples
O. furnacalis larvae were collected from a corn field at Beijing, China, in July 2020, and fed with an artificial diet in the laboratory of China Agricultural University.The incubator environment was set at 26 ± 1℃ and 50 ± 5% relative humidity on a photoperiod (light: dark = 16:8).

Genome sequencing and assembly
Genomic DNA was extracted from a female laboratoryfed pupa using the sodium dodecyl sulfate (SDS) extraction method [30].The SMRTbell library was constructed for sequencing using the SMRTbell Express Template Prep kit 2.0 (Pacific Biosciences).From this single individual, we obtained 32.96 Gb circular consensus sequencing (CCS) data from one cell of the PacBio Sequel II platform at Biomarker Co., Ltd., Qingdao, China (Additional File 1: Table S5).High accuracy PacBio Hifi data were assembled using hifiasm (v0.12) software to obtain a draft genome [31].We used purge_haplotigs [32] to remove redundant contigs and generate a non-redundant assembled genome.
The 56.25% of unique mapped read pairs represent valid interaction pairs and were used for correction, clustering, ordering, and orientation of scaffolds into chromosomes with LACHESIS [35].Before chromosome assembly, we performed a preassembly for error correction of contigs which required the splitting of contigs into segments of 50 kb on average.Then Hi-C data were mapped to these segments using BWA (v0.7.10-r789).The uniquely mapped data were retained for the assembly, and any two segments which showed inconsistent connection with information from the raw scaffolds were checked manually.Parameters for running LACHESIS included: CLUSTER_MIN_RE_SITES = 27; CLUSTER_ MAX_LINK_DENSITY = 2; ORDER_MIN_N_RES_IN_ TRUNK = 15; ORDER_MIN_N_RES_IN_SHREDS = 15.After this step, placement and orientation errors exhibiting obvious discrete chromatin interaction patterns were manually adjusted.Finally, we constructed a heatmap based on the interaction signals of valid mapped read pairs between chromosomes.
Pseudogenes have similar sequences to functional genes, but may have lost their biological function because of mutations.GenBlastA (v1.0.4) [63] was used to scan the whole genome after masking predicted functional genes.Putative candidates were then analyzed by searching for premature stop codon and frameshift mutations using GeneWise (v2.4.1) [64].

Detection of sex chromosomes
In order to identify the sex chromosomes, we performed genome resequencing with five female and male pupae.High-quality clean data (24.95~28.80-foldcoverage of the genome) were obtained through the pair-end 150 Illumina Hiseq platform at Biomarker Co., Ltd., Qingdao, China (Additional File 1: Table S7).The clean data were aligned to reference genome with Bwa-men (v0.7.17).We compared the coverage differences between male and female samples [71] to distinguish the Z, W, and autosomes.Specifically, we used the genomecov and groupby in Bedtools (v2.30.0) to obtain a per-base median coverage depth for each chromosome and normalized them by the mean of all chromosome median coverages for each sample.Normalized coverage depth was averaged by sex to produce a coverage depth per chromosome for each sex.Then we compared coverage depth between sexes for each chromosome and calculated log 2 male:female (M:F) coverage ratio [log 2 (M:F coverage)].Autosomes have an equal coverage between sexes [log 2 (M:F coverage) = 0], while the Z chromosome should show approximately twofold greater coverage in the males [log 2 (M:F coverage) = 1].The W chromosome should show a strong female-biased coverage pattern.We also calculated the M:F median coverage ratio along each chromosome with nonoverlapping 1000 bp windows.
The homolog search of O. furnacalis W in C. pomonella, C. medinalis, S. exigua, S. fugiperda, and T.ni genome also used getRBH.pl.We assessed enrichment of Gene Ontology [74,75] terms for W gene content.

W-gene expression level
We used two different RNA-Seq datasets to determine W expression, the mixture sample of RNA-Seq data from eggs, larvae, pupae, and adults used for genome annotation (described above), and data from female gonad, as sex-limited chromosomes are often largely expressed in the gonad.For female gonad, RNA from five gonad samples of adult females was sequenced on an Illumina NovaSeq 6000 platform (10.14-fold coverage of the genome, Additional File 1: Table S7).For each dataset separately, we used fastp (v0.20.0)[76] to filter out lowquality reads and to remove adapters with the default parameters.Then we mapped the clean reads to the O. furnacalis reference genome using HISAT2 (v2.1.0)[51].The FPKM (fragments per kilobase of exon per million fragments mapped) of each gene was determined using Stringtie (v2.1.4)[52] based on the annotated GFF file.For female gonad, we used averaged FPKM of five samples.

Copy number variation of W and Z/autosome genes
The amino acid sequences of protein-coding genes from whole genome were used as the input of the blastp mapping against the Swiss-Prot database to assign gene symbols (abbreviations for the gene names).We BLASTed each W gene to the remainder of the genome to identify the best hit and only retained orthologous genes on the W chromosome and Z/autosomes that consistently mapped to the same protein (reciprocal best BLAST hit) and calculated their copy number following the protocol in Mueller et al. [77] .
weak.The shading gradient represents the chromosome interactions.S1.Chromosome-level assembled Lepidoptera genomes.Table S2.Assessments of assembled genome.Table S3.Genomic annotation of Ostrinia furnacalis.Table S4.Copy number for W and autosomal/Z chromosome paralogs.Table S5.Statistics of genomic sequencing data of Ostrinia furnacalis by PacBio Sequel II.Table S6.Statistics of genomic sequencing data of Ostrinia furnacalis by Hi-C.Table S7.Statistics of genomic resequencing data of female and male pupae and transcriptome sequencing of female gonads and a mixed sample.Table S8.The download address of insect species protein sequences used for comparative genomics analysis and phylogenetic reconstruction.

Fig. 1
Fig. 1 Genomic characterization of Ostrinia furnacalis.A Circos plot depicting the genomic landscape of the 32 chromosomes (Chr1-Chr32 on a Mb scale).The denotation of each track is listed in the center of the circle.Blue lines in LG3 (the W chromosome) represent collinearity within the W, due to repeat elements.B Number and distribution of protein-coding genes.C Number and distribution of pseudogenes.D. Number of expressed genes (FPKM > 0.5 in mixed sample).E Average chromosomal expression level (excluding genes with FPKM ≤ 0.5 in mixed sample)

Fig. 2
Fig. 2 Phylogenetic tree and chromosome-level synteny analysis.A Phylogenetic tree and gene orthology of Ostrinia furnacalis with ten lepidopteran genomes and Drosophila melanogaster.1:1:1 indicates universal single-copy genes, shared by 12 species with 1 copy.N:N:N indicates other universal genes.SS indicates species-specific single-copy genes.Crambidae indicates universal genes limited in Crambidae.Others indicates all other orthologous groups.Comparative analysis of synteny between O. furnacalis and B Bombyx mori (no assembled W), C Cnaphalocrocis medinalis, D Cydia pomonella, E Spodoptera exigua, F Spodoptera fugiperda, and G Trichoplusia ni.The chromosomes of O. furnacalis are shown in the left, and the other insects' chromosomes are shown in the right

Fig. 3
Fig. 3 Detection of sex chromosomes.A Male:female coverage ratios for each chromosome, plotted by chromosome length.Each point represents a single chromosome.The dashed gray line is the theoretical expectation for autosomes and the dashed red line shows the expectation for the Z chromosome.B Male:female coverage ratios plotted in 500 bp windows across scaffolds for LG1(Z), LG3(W), and a representative autosome (LG32)

Figure S3 .
Synteny analysis between Ostrinia furnacalis and Spodoptera litura chromosomes.Chromosomes of Ostrinia furnacalis are shown in the left, number 3 represent the W chr and 1 represents the Z chr.The chromosomes of Spodoptera litura are shown in the right.Figure S4.Annotation and evaluation of protein-coding genes.a. Genes annotated via ab initio, homology-based and RNA-seq methods.b-e.Comparison of Ostrinia furnacalis gene features with other lepidopteran genomes.Figure S5.The number of repeat sequences in W chromosome (LG3).

Table 1
Functional enrichment of 66 significantly expressed genes (FPKM > 0.5 from mixed sample) on Ostrinia furnacalis W chromosome