The X chromosome of the German cockroach, Blattella germanica, is homologous to a fly X chromosome despite 400 million years divergence

Background Sex chromosome evolution is a dynamic process that can proceed at varying rates across lineages. For example, different chromosomes can be sex-linked between closely related species, whereas other sex chromosomes have been conserved for > 100 million years. Cases of long-term sex chromosome conservation could be informative of factors that constrain sex chromosome evolution. Cytological similarities between the X chromosomes of the German cockroach (Blattella germanica) and most flies suggest that they may be homologous—possibly representing an extreme case of long-term conservation. Results To test the hypothesis that the cockroach and fly X chromosomes are homologous, we analyzed whole-genome sequence data from cockroaches. We found evidence in both sequencing coverage and heterozygosity that a significant excess of the same genes are on both the cockroach and fly X chromosomes. We also present evidence that the candidate X-linked cockroach genes may be dosage compensated in hemizygous males. Consistent with this hypothesis, three regulators of transcription and chromatin on the fly X chromosome are conserved in the cockroach genome. Conclusions Our results support our hypothesis that the German cockroach shares the same X chromosome as most flies. This may represent the convergent evolution of the X chromosome in the lineages leading to cockroaches and flies. Alternatively, the common ancestor of most insects may have had an X chromosome that resembled the extant cockroach and fly X. Cockroaches and flies diverged ∼ 400 million years ago, which would be the longest documented conservation of a sex chromosome. Cockroaches and flies have different mechanisms of sex determination, raising the possibility that the X chromosome was conserved despite the evolution of the sex determination pathway.


Background
In species with separate sexes, genetic or environmental cues initiate sexually dimorphic developmental pathways [1,2]. If the cue is genetic, a sex-determining factor may reside on a sex chromosome [3]. For example, in most therian mammals, SRY on the Y chromosome initiates the development of the male germline, testes, and secondary Sex-determining pathways and sex chromosomes can evolve rapidly, often differing between closely related species [2,3]. Evolutionary transitions in sex determination pathways are often accompanied by corresponding changes in the identity of the sex chromosomes [1,2,12]. Transitions in sex-determining pathways and turnover of sex chromosomes are well studied across insects, where there is a diversity of sex determination mechanisms [13][14][15][16] (Fig. 1). For example, the genetic factors that initiate sex determination in Drosophila do not determine sex in other flies [19][20][21][22][23][24][25][26]. In addition, the sex chromosomes of Drosophila are not homologous to the sex chromosomes of other flies [18,27,28]. The evolution of a new sex determination mechanism in the lineage leading to Drosophila resulted in the transition of the ancestral X chromosome into an autosome, the creation of a new X chromosome from an ancestral autosome, and the evolution of a new mechanism of X chromosome dosage compensation [18,29].
It is most parsimonious to conclude that the ancestral sex determination system of brachyceran dipterans (which includes flies but excludes mosquitoes, crane flies, midges, gnats) consists of a Y-linked male-determining factor that regulates the splicing of the transformer (tra) gene product [15,22,26,[30][31][32][33]. The ancestral male-determining gene of brachyceran flies is yet to be identified, if it is even still present in any extant species. The ancestral brachyceran X chromosome is known as Muller element F [18]. Element F has reverted to an autosome in D. melanogaster, where it is also known as chromosome 4 or the "dot" chromosome. The dot chromosome is enriched for heterochromatin and has fewer than 100 genes [34]. Element F is notable because most X chromosomes are gene rich and euchromatic, despite having some differences in gene content from the autosomes [35][36][37]. This peculiar element F X chromosome has been conserved for >150 million years (My) in some fly lineages, but it reverted to an autosome in Drosophila when a different chromosome became X-linked [18,38]. The remainder of the fly genome is organized into 5 euchromatic chromosomes (or chromosome arms), named Muller elements A-E [39,40]. Element A is the X chromosome in D. melanogaster.
There is some evidence that the X-linked element F is dosage compensated in hemizygous males. In D. melanogaster, where element F is autosomal, Painting of fourth (Pof ) encodes an RNA-binding protein that localizes predominantly to element F [41]. Lucilia cuprina (Australian sheep blowfly) has the ancestral brachyceran karyotype, with an X-linked element F [42,43]. Expression of X-linked genes is upregulated in L. cuprina males by the homolog of Pof [42,44]. This dosage compensation is essential for male viability-a loss of function mutation in the L. cuprina homolog of Pof is male lethal, but viable in females [44].
The German cockroach, Blattella germanica, diverged from flies ∼ 400 My ago (Mya) [17]. Female cockroaches are XX and males are XO, i.e., one X and no Y chromosome [13,45]. This suggests that a dosagesensitive X-linked factor determines sex in German cockroach, analogous to, but independently evolved from, Drosophila. Curiously, the cockroach X chromosome is heterochromatic along most of its length [46], reminiscent of element F, the ancestral brachyceran X chromosome. We tested the hypothesis that the German cockroach X chromosome is homologous to fly element F, which would suggest that a cockroach and most flies share an X chromsomome despite ∼ 400 My divergence. Fig. 1. Insect phylogeny and sex chromosomes. Evolutionary relationships and sex chromosome karyotypes of major insect groups. The phylogenetic topology and time to common ancestor are shown [17], but the relative branch lengths are not drawn to scale. Information on insect sex chromosomes and sex determination are reviewed elsewhere [2,3,13,16,18]

Decreased sequencing coverage of element F homologs in male cockroaches
We used a differential sequencing coverage approach to identify X chromosome genes in the German cockroach genome assembly. X-linked genes are expected to have half as many male-derived reads mapped to them as female-derived reads because the X chromosome is present in one copy in males and two copies in females [18]. We used available whole-genome sequencing data [47] to calculate the relative coverage of male (M) and female (F) reads log 2 M F for each annotated cockroach gene (Additional file 1). The mode of the log 2 M F distribution is at 0 (Fig. 2a), as expected, because we recalibrated the log 2 M F values to have a median of 0 (see the "Methods" section). However, there is a heavy shoulder of genes with log 2 M F < 0, suggesting that X-linked genes are also in the assembly (Fig. 2a). In total, 3499 of the 28,141 annotated genes have female-biased coverage (log 2 M F ≤ − 1), whereas only 1363 genes have male-biased coverage (log 2 M F ≥ 1), consistent with a heavy shoulder of X-linked genes. Assuming the 1363 male-biased genes represent the false-positive rate, we expect 2136/3499 female-biased genes to be X-linked. This is consistent with the upperbound of the number of X-linked genes in the cockroach genome-the cockroach X is the smallest of 12 chromosomes [46], which means that fewer than 2345 genes (28,141/12) should be X-linked.
To test the hypothesis that the German cockroach X chromosome is homologous to the ancestral brachyceran fly X (i.e., Muller element F), we evaluated if cockroach genes with D. melanogaster homologs on element F have lower log 2 M F than genes with homologs on the other 5 elements. Cockroach genes with D. melanogaster homologs on Muller elements A-E have distributions of log 2 M F centered around 0, consistent with being autosomal (Fig. 2b). In contrast, the 51 cockroach element F homologs have a median log 2 M F < 0, and the average log 2 M F for element F homologs is significantly less than the other genes (P=10 −10 using a Mann-Whitney U test comparing element F homologs with elements A-E). If all element F homologs were X-linked in cockroach, we would expect the median log 2 M F = −1 for genes with element F homologs. However, cockroach element F homologs have a median log 2 M F > − 1. Therefore, we hypothesize that a disproportionate amount of, but not all, element F homologs are X-linked in German cockroach.
We next estimated the frequency of element F homologs that are X-linked in the German cockroach. First, we used the mclust package in R to fit a mixture of normal distributions to the log 2 M F values of element F homologs [48]. The best fitting mixture consists of 3 distributions, with 1 centered at a mean of − 1.02 (Table 1), close to the expectation of log 2 M F = − 1 for X-linked genes. This suspected X-linked distribution contains ∼ 41% of the 51 element F homologs, and it has very little overlap with the other 2 distributions (Fig. 2b). One of the other 2 distributions is centered very close to 0 (the expectation for autosomal genes), and it has very low variance. The third distribution has a mean log 2 M F = − 0.23 and a large variance. We suspect that the 2 distributions with log 2   M F = − 1.00 that has 43% of element F homologs and a second distribution with a mean log 2 M F = − 0.09 that has 57% of element F homologs (Additional file 2). Moreover, with a mixture of 4 normal distributions, we recover 2 distributions centered near log 2 M F = − 1 that together have 40% of element F homologs. Therefore, regardless of the number of distributions in our mixture model, we recover at least 40% of cockroach element F homologs that fall within a distribution consistent with X-linkage.
In contrast to element F, the log 2 M F values for cockroach genes with D. melanogaster homologs on elements A-E can be best explained by a mixture of 4 distributions ( Table 1). The distribution within this mixture model that is most consistent with X-linkage has a mean of − 0.89, a large variance of 5.6, and contains only 37 of the 5602 element A-E homologs. Most element A-E homologs (4957) are assigned to 2 distributions with means of 0.0015 and 0.049, which are both consistent with autosomes ( Fig. 2b). Together, our analysis of mixture models suggest that a large fraction of element F homologs are X-linked in German cockroach, whereas the vast majority of element A-E homologs are autosomal.
The distributions of log 2 M F seem to describe 2 classes of element F homologs: autosomal genes with log 2 M F > − 0.5 and X-linked genes with log 2 M F < − 0.5 (Fig. 2b). If there is an excess of element F homologs on the cockroach X, we expect a higher frequency of element F homologs to have log 2 M F < − 0.5 than genes on the other 5 elements. We therefore counted the number of genes with log 2 M F < − 0.5 on each of the 6 Muller elements (Table 2). To determine a null distribution of those genes on each element, we randomly assigned the total number of genes with log 2 M F < − 0.5 to the 6 elements based on the size of each Muller element (measured as the total number of cockroach genes on the element) in 1000 bootstrap replicates of the data. A significant excess of cockroach element F homologs have log 2 M F < − 0.5 relative to our null expectation (Fig. 2c). This provides further evidence that an excess of element F homologs is X-linked in German cockroach.

Reduced heterozygosity of element F homologs in male cockroaches
German cockroach males have one copy of the X chromosome, and females have two copies of the X. We therefore expect that females could be heterozygous for polymorphic genetic variants in X-linked genes, whereas males must be hemizygous (only one allele per gene). If element F homologs are X-linked in cockroach, we expect to observe an excess of element F homologs without heterozygous variants in an individual male when compared to element A-E homologs and also when compared to female heterozygosity in element F homologs. To test this prediction, we used the available cockroach genome sequence data to identify heterozygous sequence variants in cockroach genes (Additional file 1).
The German cockroach genome project generated sequence data from a single male and single female of an inbred laboratory strain [47]. We therefore expect to observe no heterozygous variants in the male for X-linked genes, but the female could have heterozygous X-linked variants. However, there are also likely to be errors in variant calling and genotyping that could produce falsepositive heterozygous calls. Because of these false positives, we may observe heterozygous variants in element F homologs in males even if the genes are X-linked. To address this limitation, we tested for reduced heterozygosity in element F homologs in males, rather than an absence of heterozygous variants.
We first compared the heterozygosity of cockroach genes in males and females across Muller elements (Fig. 3). In females, there is no significant difference in the heterozygosity between genes assigned to element F and genes on the other five elements (P = 0.32 in a Mann-Whitney U test). In contrast, male element F homologs have significantly fewer heterozygous variants than genes on elements A-E (P = 0.017 in a Mann-Whitney U test). This reduced male heterozygosity in element F homologs is consistent with an excess of element F homologs on the German cockroach X chromosome.
We expect candidate X-linked genes with reduced log 2 M F sequencing coverage to also have reduced heterozygosity in males relative to females. To test this hypothesis, we calculated, for each gene, a ratio of the number male heterozygous variants to the total number of heterozygous variants in the male and female samples. This value ranges from 0 (if a gene only has heterozygous variants in females) to 1 (if a gene only has heterozygous variants in males). Equal heterozygosity in both sexes has a value of 0.5. Of the 40 element F homologs with sequencing coverage and heterozygosity data, 10 (25%) have both log 2 M F < − 0.5 and fraction of male heterozygous variants < 0.5 (Fig. 3c). This is significantly greater than the 2.5% of element A-E homologs with both log 2 M F < − 0.5 and fraction of male heterozygous variants < 0.5 (z = 9.68, P = 10 −21 ). This result provides further evidence that there is an excess of element F homologs on the German cockroach X chromosome.

Validation of candidate X-linked element F homologs
We selected two element F homologs that we hypothesize are X-linked (BGER000638 and BGER000663) to validate using quantitative PCR (qPCR). Both genes have log 2 M F < − 1, and one gene (BGER000638) has three times as many heterozygous variants in the female compared to the male (Additional file 1). The other gene has no heterozygous variants in either sex. We found that both genes had a significantly higher concentration in females relative to males in our qPCR assay, with an estimated female concentration that is twice the male concentration (Additional file 3) [49]. This is the expected result if both genes are X-linked. Therefore, male:female sequencing coverage, heterozygosity, and qPCR provide consistent evidence that element F homologs are X-linked in German cockroach.

The cockroach X chromosome may be dosage compensated in males
We next tested if the haploid dosage of element F homologs affects their expression in male cockroach. The ideal data to test for the effects of a haploid X are expression measurements from males and females from the same tissue and developmental stage [10,11]. Unfortunately, there are no available sex-matched RNA-seq gene expression datasets from German cockroach. We therefore used an alternative approach in which we compared the expression in adult male heads with a mixed sex adult head sample (Additional file 1). We also compared expression in adult male heads with whole adult females (Additional file 1). If the haploid X chromosome is dosage compensated in males, we expect the distributions of log 2 fold change (log 2 FC) expression between the two tissue samples to be equivalent for cockroach genes with homologs on element F and elements A-E. Indeed, there is no significant difference in the median log 2 FC between element F homologs and element A-E homologs (P = 0.15 for male head vs mixed sex head, P = 0.30 for male head vs whole adult female, with both P values from Mann-Whitney U tests; Fig. 4a, b).
Only a subset of element F homologs is expected to be X-linked in cockroach based on log 2 M F sequencing coverage (Fig. 2b). If the X chromosome is dosage compensated in males, we expect the average log 2 FC expression between tissue samples to be similar for element F homologs with evidence of X-linkage (log 2 M F < − 0.5) and element F homologs that appear to be autosomal (log 2 M F ≥ − 0.5). Indeed, there is no significant difference in log 2 FC between the two subsets of element F homologs (P=0.84 for male head vs mixed sex head, P = 0.30 for male head vs whole adult females, with both P values from Mann-Whitney U tests; Fig. 4c, d). The same is true for element A-E homologs: there is no significant difference in log 2 FC of male head vs mixed sex head between low and high coverage element A-E homologs (P = 0.054 in a Mann-Whitney U test) nor is there a significant difference in log 2 FC of male head vs whole adult female between low and high coverage element A-E homologs (P = 0.65 in a Mann-Whitney U test). The comparison of log 2 FC in male vs mixed sex head for element A-E homologs has the lowest P value. If this low P value was evidence for a lack of dosage compensation, we would expect genes with low male sequencing coverage (log 2 M F < − 0.5) to have lower male expression than genes with higher male sequencing coverage (log 2 M F ≥ − 0.5). However, genes with low male sequencing coverage have higher male expression (median log 2 FC = 0.0039) than genes with higher male sequencing coverage (median log 2 FC = − 0.15). Therefore, the limited RNA-seq data that are available suggest that the German cockroach X chromosome may be dosage compensated in males.

Conservation of element F transcriptional regulators in cockroach
In some fly species where element F is the X chromosome, X-linked genes are present in a single (haploid) copy in males [18]. Males of the blow fly L. cuprina are haploid for such an X chromosome, and their  [42,44]. POF localizes nearly exclusively to element F gene bodies in D. melanogaster [41,[50][51][52]. There is a Pof homolog in the cockroach genome (BGER016147), which we aligned to the D. melanogaster protein sequence. The most conserved region of D. melanogaster Pof overlaps with a predicted RNA-binding domain within the cockroach protein sequence (Fig. 5a, b). Therefore, a key component of the molecular machinery which regulates the dosage compensation on the X-linked fly element F is present in the German cockroach genome.
The proteins encoded by eggless (egg) and windei (wde) interact with POF to create an environment around genes on element F that resembles pericentromeric heterochromatin in Drosophila. Egg is a SETDB1 homolog that is responsible for di-and/or tri-methylation of lysine 9 in histone H3 in the gene-dense region of D. melanogaster element F [53][54][55][56][57]. There are two predicted homologs of egg in the cockroach genome (BGER011023 and BGER011024). BGER011023 has a predicted SET lysine methyltransferase domain and a methyl-CpG-binding domain commonly found in histone methyltransferases. BGER011024, on the other hand, has a tudor domain, which is found proximal to the SET domain in D. melanogaster Egg [58]. These predicted functional domains overlap with the portions of the cockroach proteins that are most conserved relative to D. melanogaster Egg (Fig. 5c, d). BGER011023 and BGER011024 are contiguous on a single B. germanica scaffold (Scaffold202; KN196692), suggesting that together they may constitute a single gene that encodes all Egg functional regions.
Wde is an essential co-factor of Egg [59]. There is one predicted homolog of wde in the cockroach genome annotation (BGER025676), but an independently sequenced cockroach wde gene (CCX34999) is longer than the wde homolog predicted by the automated annotation [60].
We therefore compared CCX34999 with D. melanogaster Wde. CCX34999 contains a predicted fibronectin type-III domain at the C-terminal end, similar to D. melanogaster Wde [58]. The C-terminal end of CCX34999 is also the most conserved part of the protein relative to D. melanogaster Wde (Fig. 5e, f ). There is a coiled-coil region of D. melanogaster Wde that is required to interact with Egg. That coiled-coil region of Wde, and the corresponding region of Egg that interacts with Wde, is among the most conserved regions of the D. melanogaster proteins when compared to the cockroach homologs (Fig. 5c,  e). Therefore, homologs of Pof and its two key interactors are present in the German cockroach genome, showing it is possible that a similar mechanism may dosage compensate the cockroach and ancestral fly X chromosomes in hemizygous males.

Discussion
We provide two lines of evidence that the X chromosome of the German cockroach, B. germanica, is homologous to Muller element F, which is X-linked in most flies. First, there is a reduced sequencing coverage of nearly half of the Muller element F homologs in male cockroach, consistent with a haploid dose of the X chromosome in males (Fig. 2). Second, there is a decreased heterozygosity of element F homologs in male cockroach, including those with reduced male sequencing coverage (Fig. 3). We therefore hypothesize that element F is an ancient X chromosome that was present in the most recent common ancestor (MRCA) of flies and cockroaches, and it has been conserved as an X chromosome in the German cockroach and many fly species. An alternative explanation for the excess of element F homologs on the cockroach X chromosome is that those genes independently became X-linked in both cockroaches and flies. There are at least four lines of evidence that favor the hypothesis that element F is an ancient X chromosome retained since the MRCA of cockroaches and flies, as opposed to convergent recruitment of the same genes onto the fly and cockroach X. First, an independent analysis concluded that the MRCA of flies and cockroaches had XX females and either XY or XO males [16]. Second, the B. germanica X chromosome stains heavily for heterochromatin [46], similar to the brachyceran fly X-linked element F [61]. X chromosomes tend to be euchromatic in males [35][36][37], making the similarity between the B. germanica and brachyceran X heterochromatin notable. However, most of what we know about insect sex chromosome heterochromatin comes from cytological examination of meiotic cells from the testes [62], where sex chromosome-specific heterochromatization could differ from the normal behavior in somatic cells [63]. Additional work is necessary to investigate the chromatin state of insect sex chromosomes outside of the male germline. Third, the observed number of element F homologs with evidence for X-linkage in cockroach greatly exceeds the expectation if the X chromosomes of flies and cockroaches were independently derived (Fig. 2c). Fourth, the fraction of element F homologs that appear to be X-linked in cockroaches (> 40%) is consistent with two separate estimates of the expected conservation of a shared X chromosome that was present in the MRCA of flies and cockroaches. We explain the two separate estimates of expected X chromosome conservation below.
The first estimate of expected conservation of an Xlinked element F draws upon the rates of gene relocation between Muller elements in Drosophila. If element F was the ancestral X chromosome of the MRCA of flies and cockroaches, we would expect some relocation of genes onto and off of element F as the lineages leading to cockroaches and flies diverged from their MRCA [64]. Based on the frequency of gene relocation between Muller elements in Drosophila [65] and the sizes of the elements in D. melanogaster, we expect 6.4 genes to have relocated off element F in the cockroach lineage and 1.3 genes to have relocated onto element F in the fly lineage (see the "Methods" section for calculations). There are up to 30 (60% of 51) D. melanogaster element F homologs that do not have evidence for X-linkage in cockroach (Fig. 2b). Gene movement alone can thus explain 7-8 of these apparently autosomal element F homologs.
The second estimate of expected conservation of an X-linked element F extrapolates from the conservation of element F between D. melanogaster and the blow fly L. cuprina. In the L. cuprina genome, only 67.1% (49/73) of genes with D. melanogaster element F homologs are X-linked [44]. Assuming a linear relationship between divergence time [38,66] and conservation of element F gene content, we would expect only 11.1% of cockroach genes with element F homologs to be X-linked: 67.1% × 64 My since divergence between Drosophila and blow flies 386.9 My since divergence between flies and cockroaches Our estimate of the fraction of element F homologs that are X-linked in B. germanica (> 40%) is in between the estimates predicted based on the rates of gene relocation and a linear loss of gene content. Therefore, conservation of an X-linked element F from the MRCA of flies and cockroaches is consistent with the expected amount of gene movement in the time since the MRCA. Curiously, there is a long tail of genes with much higher sequencing coverage in females relative to males (log 2 M F − 1), regardless of the Muller element of their D. melanogaster homologs (Fig. 2a). Sexually dimorphic amplification (endoreplication) of a subset of the genome has been documented in insects, such as in the chorion genes that are highly expressed in Drosophila ovary [67,68]. It is therefore possible that a subset of the cockroach genome is disproportionately amplified in females (possibly to meet the gene expression demands of oogenesis), causing the long tail of negative log 2 M F values that we observe. Additional work is necessary to test this hypothesis.
Our analysis of RNA-seq data suggests that the cockroach X chromosome may be dosage compensated in males-we find no evidence for reduced expression of element F homologs in male cockroaches, regardless of whether the genes appear to be haploid in males (Fig. 4). Previous work found evidence that the cockroach tra homolog may regulate dosage compensation because knockdown of tra in cockroach females results in female-specific lethality of their progeny [69]. Here, we found that homologs of genes involved in regulating the expression of element F genes in flies are present in the cockroach genome, with their functional domains conserved (Fig. 5). This is consistent with cockroaches and flies sharing a mechanism of X chromosome dosage compensation that has been conserved since their MRCA. Future work should further investigate if the regulators of sex determination and dosage compensation in flies (e.g., tra and Pof ) have similar roles in cockroach. An important limitation of our analysis is that we did not compare the same tissues between males and females [10,11]. Our inference of dosage compensation may be confounded by, for example, differences in cell types between tissues [70]. Further work is therefore necessary to more rigorously test for dosage compensation of the cockroach X chromosome with appropriate gene expression comparisons between males and females.
Finally, our results provide evidence that X chromosomes can be conserved even though there are changes in the master regulators of sex determination. Sex in B. germanica is likely determined by X chromosome dosage, analogous to Drosophila, but different from the ancestral fly sex determination system, which relies on a dominant male determiner located on the Y chromosome (Fig. 1). It is unlikely that the same X-linked dosage-sensitive factors determine sex in cockroaches and Drosophila because the X chromosome is not homologous between the two taxa (element A is the X chromosome in Drosophila). In addition, the master regulators of Drosophila sex determination almost certainly differ from the sex determiners in the MRCA of brachyceran flies, which likely used a Ylinked male determiner (Fig. 1). Moreover, the sexually dimorphic splicing of the sex determination pathway gene tra differs between German cockroaches and flies [69]. Therefore, we hypothesize that B. germanica has a homologous X chromosome with the MRCA of brachyceran flies, but the sex determination system is not conserved between cockroaches and flies. Our results suggest that conservation of sex chromosomes does not necessarily imply conservation of sex determination. Future work addressing this problem could inform our understanding of how evolutionary transitions in sex determination pathways can be decoupled from sex chromosome turnover [71].

Conclusions
We present evidence that the X chromosome of the German cockroach is homologous to an X chromosome shared by many fly species. We hypothesize that this X chromosome was inherited from the MRCA of cockroaches and flies > 400 Mya. To the best of our knowledge, this would be the longest documented conservation of an X chromosome. This ancient X chromosome may be dosage compensated in male cockroaches and flies by a conserved mechanism. The extremely long-term conservation of the X chromosome is especially remarkable because cockroaches and flies have diverged in their sex determination pathways, suggesting that sex chromosome conservation can be decoupled from the evolution of sex determination.

Assigning German cockroach genes to Muller elements
Drosophila and other fly genomes are organized into six chromosomes (or chromosome arms) known as Muller elements [27,39,72,73]. Muller element F is the ancestral X chromosome of brachyceran flies, and elements A-E are autosomal in flies with this ancestral karyotype [18]. We assigned each B. germanica gene with a single D. melanogaster homolog to the Muller element of its homolog. We retrieved the D. melanogaster homologs of B. germanica genes from the Baylor College of Medicine i5k Maker annotation, version 0.5.3 [47]. This annotation pipeline was performed as part of the B. germanica genome project [47]. We only assigned B. germanica genes to Muller elements if they have a single D. melanogaster homolog in the annotation (i.e., we did not include genes with multiple predicted D. melanogaster homologs or without any predicted homologs).

Differential sequencing coverage in males and females
We tested for genes that were sequenced at different depths in males and females as a way to identify X chromosome genes [18]. First, we aligned paired-end reads from three male cockroach whole genome sequencing libraries (SRX693111, SRX693112, and SRX693113) and one female library (SRX693110) to the reference B. germanica genome assembly (JPZV00000000.1; [47]), using BWA-MEM with default parameters [74]. We then assigned mapped read pairs to genes (from the v. 0.5.3 i5k annotation) if the first (forward) read aligned to any portion of a gene sequence. We only considered the forward read because insert sizes differ across the available sequencing libraries, which could introduce biases in gene coverage if we allowed or required both forward and reverse reads to overlap genes. Considering only the forward read should decrease the effect of these biases because read lengths are the same (101 bp) across all libraries. We summed across libraries to determine the total number of reads mapped to each gene for each sex. We next divided the number of malederived (female-derived) reads aligned to each gene by the total number of male-derived (female-derived) reads aligned to all genes to determine a normalized mapping coverage of male-derived (female-derived) reads for each gene (Additional file 1). We used these normalized counts to calculate the log 2 male:female read mapping coverage (log 2 M F ) for each annotated cockroach gene, and we normalized the data so that the median across all genes assigned to Muller elements is 0.
We used the mclust package to fit a mixture of multiple normal distributions to the log 2 M F values [48]. We did this separately for element F homologs and genes assigned to elements A-E. The Mclust() function uses an expectationmaximization algorithm to obtain maximum likelihood estimators of the mean, variance, and number of genes in each normal distribution. It fits two different models for mixtures of 1 through 9 normal distributes: (1) mixture models where each normal distribution has the same variance (i.e., mixture of univariate normal distributions) and (2) mixture models where the normal distributions have unequal variances. We then compared Bayesian information criteria (BIC) across the nested models to determine the number of normal distributions that fit data the best (Additional file 2). We also compared BIC values to test if the best fitting distributions are univariate or have unequal variances.

Quantitive PCR validation of candidate X-linked genes
We used qPCR to validate two candidate X-linked genes in German cockroach. Briefly, genomic DNA was extracted from the head and legs of five individual male and five individual female cockroaches from the Orlando Normal strain. We designed PCR primers to amplify the genomic region corresponding to each gene, as well as two control genes that we hypothesize are autosomal (sequences provided in Additional file 3). We used a StepOne Plus Real-Time PCR System (Applied Biosystems) to quantify the concentration of DNA from each of the candidate genes and the control genes in each individual cockroach. We then used a mixed effects model to assess the effect of sex on the concentration of the candidate X-linked genes. Details are provided in Additional file 3.

Differential heterozygosity in males and females
We tested for genes with reduced heterozygosity in males (including relative to females) as an additional way to identify X chromosome genes. We used the Genome Analysis Toolkit (GATK) version 3.4-0 to identify heterozygous single nucleotide polymorphisms (SNPs) and small variants in the alignments of male and female sequencing reads described above, following the GATK best practices [75][76][77]. Because there is no reference variant set for cockroaches, we used the following steps to extract high confidence variants [71]. First, we used Picard Tools version 1.133 to identify and remove duplicate reads, and we realigned indels with GATK. Then, we performed naive variant calling using the GATK Haplo-typeCaller with a phred-scaled confidence threshold of 20. We selected the highest confidence SNPs from that first pass (QD < 2.0, MQ < 40, FS > 60, SOR > 4, MQRankSum < − 12.5, ReadPosRankSum < − 8). We also selected the highest confidence insertions and deletions (indels) from the first pass (QD < 2.0, FS > 200, SOR > 10, ReadPosRankSum < − 20). We used those highquality variants to perform base recalibration, we re-input those recalibrated bases into another round of variant calling, and we extracted the highest quality variants. We repeated the process so that we had performed three rounds of recalibration, which was sufficient for convergence of variant calls. We applied GenotypeGVCFs to the variant calls from all of the Illumina libraries for joint genotyping of both males and females, and we selected only the high-quality variants using VariantFiltration (FS > 30 and QD < 2). All three male sequencing libraries were treated as a single sample in this analysis because they came from the same individual male [47]. We used hard cutoff values because we did not have sufficient data to train a probabilistic variant filter. We then extracted variants that mapped to B. germanica genes (from the v. 0.5.3 i5k annotation). Variants were considered to be within a gene if they fell within the beginning and end coordinates of an annotated gene, including within exons or introns.
We identified heterozygous variants as those with two different alleles at that site in either the male or female sample. The two alleles could be either be one reference allele and one alternate, or they could be two alternate alleles. To calculate heterozygous variants per Mb within each gene, we used the differences of the beginning and end coordinates of each annotated gene in the genome assembly as a measure of gene length. To calculate the fraction of heterozygous variants in the male, we counted the number of heterozygous variants in the male (H m ) and female (H f ) samples separately for each gene. We then divided the number of heterozygous variants in the male sample by the sum of the number of heterozygous variants in the male and female samples for each gene (H m /[ H m + H f ]).

Differential gene expression using RNA-seq data
We compared the expression of genes in adult male heads (NCBI SRA accessions SRX3189901 and SRX3189902) with expression in a mixed sex adult head sample (SRX682022) using available RNA-seq data [78,79]. We also compared male head expression with expression in whole adult females (SRX2746607 and SRX2746608) [47]. We aligned the RNA-seq reads from each library to B. germanica transcripts (from the version 0.5.3 i5k annotation) using kallisto [80]. The male head libraries were sequenced using single-end reads, and we specified an average fragment length (-l) of 200 bp and a standard deviation (-s) of 20 bp. There is only a single transcript for each gene in the B. germanica annotation, and so we treated transcript-level read counts as equivalent to genewise counts. We also only included genes with at least 10 mapped reads across all samples. We then used DESeq2 to estimate the log 2 fold change of the expression for each gene between male heads and mixed sex heads, as well as between male heads and whole adult females [81]. All reads from a given accession were treated as belonging to a single replicate (i.e., we summed read counts of different sequencing runs within each accession).

Conservation of element F regulators
We aligned the sequences of three D. melanogaster proteins that regulate element F gene expression (POF, Eggless, and Windei) with their B. germanica homologs using MUSCLE [82]. We then calculated amino acid (aa) sequence conservation in 50 aa sliding windows (with 1 aa increments) in the reference protein sequence. Gaps in the cockroach sequences were counted as mismatches, and gaps in the D. melanogaster sequences were ignored. Functional domains were predicted by the NCBI Conserved Domain Database [58] or retrieved from UniProt [83].

Expected conservation of element F
We performed calculations to estimate the number of genes relocated onto and off of element F in the lineages leading to cockroach and flies. First, the expected number of genes relocated from element F to the other elements in the lineage leading to the German cockroach was estimated from the observed number of X-to-autosome relocations in the lineage leading to D. melanogaster since the divergence with Drosophila pseudoobscura (24) [65], the fraction of genes on element F (86/14237 = 0.006) and element A (the Drosophila X chromosome, 2274/14237 = 0.16) in D. melanogaster [84], the divergence time between D. melanogaster and D. pseudoobscura (54.9 My) [85], and the divergence time between flies and cockroaches (386.9 My) [17]. We assumed that the rate of relocation from the ancestral X chromosome to the autosomes in the lineage leading to cockroach is the same as the rate from the Drosophila X to autosomes. We then calculated the expected number of genes relocated from element F to other elements in the lineage leading to the German cockroach as: Second, to estimate the number of genes relocated onto element F from other elements in the lineage leading to D. melanogaster, we included an estimate of the number of autosome-to-X relocations in the lineage leading to D. melanogaster since the divergence with D. pseudoobscura (5) [65]. We treated element F as an X chromosome in the entire lineage leading from the MRCA of flies and cockroach, which it was for most of that time (332/387 My). We then calculated the expected number of genes relocated onto element F in the lineage leading to D. melanogaster as:

Supplementary information
Supplementary information accompanies this paper at https://doi.org/10.1186/s12915-019-0721-x. Additional file 3: Methods and results of qPCR validation of 2 candidate X-linked cockroach genes. Supporting Data, Table S1, Figure S1. Table S1: PCR primers used to test for X-linkage in qPCR assay. Figure S1: Relative concentrations of BGER000638 and BGER000663 in male and female B. germanica.