Investigating one species falls short of a comprehensive view on gene function
Large-scale screens in the leading insect model organism D. melanogaster have revealed gene sets required for certain biological processes. As consequence, insect-related GO term annotations are almost exclusively based on work in flies. However, there are several reasons to believe that the picture has remained incomplete. On one hand, species-specific or technical limitations may have prohibited the identification of an involved gene in D. melanogaster. On the other hand, evolution has led to functional changes such as the modification or loss of ancestral gene functions or the co-option of genes into a novel process. Unfortunately, it has remained unclear to what extent the gene sets determined exclusively in flies would be representative of insects as a whole or if it is even appropriate to assume the existence of representative gene sets.
Our systematic screening in a complementary model organism has revealed that the identified gene sets show an unexpected degree of divergence (see Fig. 4 for numbers, Fig. 5 for examples). Based on our calculations (see details below) we estimate that only half of the gene functions are detected in both species (52%, column 4 of Fig. 4A) while the remaining gene functions were found either only in D. melanogaster (11%, column 4 of Fig. 4A) or only in T. castaneum (37%, column 4 of Fig. 4A). We found no strong indication that the gene inventory required for a process would be more conserved for those processes, which seem more conserved morphologically. For instance, dorso-ventral patterning, which we assumed representing an intermediate degree of conservation showed the largest common gene set while the supposedly most conserved process, muscle formation, showed the lowest value. However, we note that we found more Tribolium-specific gene functions for the less conserved processes than for muscle development. However, given the uncertainties with these numbers (see the “Discussion” section below) and the fact that morphological conservation of a process is hard to quantify, we hesitate to draw conclusions about the correlation of divergence of a biological process and the involved gene inventory.
While these data were gained with respect to developmental processes only, they strongly indicate that our current knowledge based on screening in one species appears to be much less comprehensive than previously thought. We believe that the different proportions of genes shown to be required for a specific process in only one species (11% vs. 37%) may reflect both, biological and technical differences (see detailed discussion below).
Beyond the fly-beetle comparison, our findings provide a compelling argument that focusing on single model species falls short of comprehensively revealing the genetic basis of biological processes in any clade separated by an evolutionary distance similar or larger than the one separating flies and beetles (i.e., around 370 million years). Further, it shows that T. castaneum is an extremely useful screening system for insect biology, able to reveal novel gene functions even in processes that have been studied intensely in D. melanogaster.
Estimating the portions of gene functions revealed in fly versus beetle
In order to make a comprehensive and quantitative comparison, we included in our comparisons all genes that are currently known to be involved in the respective processes from both, beetle and fly. Our beetle data are based on both, our systematic screening of 51% of the T. castaneum gene set and on previous candidate gene work. With respect to fly data, we rely on information available on FlyBase and our expert knowledge of the processes under scrutiny. Given these different kinds of sources and approaches, and the fact that we focus on developmental processes, the data are – despite the comprehensive approach - prone to various types of uncertainties. In the following, we first discuss the way we combined the numbers to calculate our estimation. Subsequently, we will discuss some uncertainties and in how far they influence the estimation.
Of the genes known from D. melanogaster to be required for the processes investigated here (n = 132; see Additional file 5: Table S4), we could compare 66% to iBeetle data (column 1 in Fig. 4A; based on Additional file 2: Fig. S1; n = 87). Of those genes, 26% (n = 23) were not required for that process in T. castaneum (column 2 in Fig. 4A; based on Fig. 3). With this statement, we mean that the respective genes did not have any phenotype with respect to the biological process in question. They could have either no phenotype or a phenotype affecting another process. Based on our positive controls, the potential error affecting this statement is less than 7.5% (see Additional file 1: Table S1). For our overall estimation, we extrapolated this share to the total number of genes required for the fly (dotted lines from column 2 to column 4). A number of gene functions detected in the iBeetle screen had not been assigned such functions in D. melanogaster before (column 3 in Fig. 4A; based on Fig. 2). When combining these numbers, we aimed at providing a minimum estimation for the divergence of detected gene functions (Column 4 in Fig. 4A). To be conservative, we assumed that all gene functions known from D. melanogaster but not yet tested in the iBeetle screen would fall into the class of genes being required for both species (see numbers in green square in Additional file 5: Table S4). Further, we scored each signaling pathway as one case (finding mostly conservation) even if single components of these pathways did not have divergent phenotypes. This conservative assumption leads to the abovementioned minimum estimation of divergence in these gene sets (Column 4 in Fig. 4A; calculation in Additional file 5: Table S4). Of all genes currently known to be required for one of the processes we studied, the portion of genes detected exclusively in the fly (11%; n = 23) is much smaller than the one detected only in the beetle (37%; n = 76) while the analogous function of half of the genes (52%; n = 109) is detected in both species.
With this work, we present the first and a quite extensive dataset to estimate this kind of numbers. Still, some confounding issues need to be considered. The first uncertainty stems from the fact that the beetle data is based on testing about 50% of the genes. In the second part of the screen, we had prioritized genes that were moderately to highly expressed, showed sequence conservation, and had GO annotations. The prioritization apparently was successful as 66% of the gene functions known from D. melanogaster had been covered in the iBeetle screen (Fig. 4A), which is much more than the 40% expected for an unbiased selection [17]. Hence, our figures might be biased towards conserved gene function. As a consequence, the overall portion of beetle-specific genes without conserved functions likely is even higher than reflected in Fig. 4A.
Second, we found quite different numbers for the four processes under scrutiny (Fig. 4B). However, even in the process with the lowest portion of genes detected exclusively in T. castaneum (muscle development), this portion was 21%, which still indicates a significant degree of unexplored biology.
Third, the D. melanogaster numbers could be influenced by false negative data. The data on FlyBase has not been gathered in one or few standardized screens where all data were published—it is mainly based on published results of single gene analyses. However, not all genetic screens have reached saturation and not all genes detected in large-scale screens may have been further analyzed and published. Hence, the number of genes in principle detectable in D. melanogaster might actually be larger than the numbers extracted from FlyBase. In the iBeetle screen, in contrast, negative data was systematically documented, such that this type of uncertainty is restricted to technical false negative data, which we found to be around 15% in this first pass screen (Fig. 1). This uncertainty could potentially increase the portion of D. melanogaster-specific or conserved genes. Fourth, theoretically, there may be false positive data albeit restricted to the set of genes detected in both species. The reason is that iBeetle was a first pass screen, where we aimed at reducing false negative data with the tradeoff that false positive data are enriched [17]. Although finding similar phenotypes in two different species will not in many cases be false positive, we tried to minimize this error by manually checking the annotations of the respective genes, excluding those that showed a phenotype with low penetrance or in combination with many other defects indicating a non-specific effect. Of note, the issue of false positives is restricted to the genes detected in both species (column 2; based on Fig. 3). It does not apply to those genes detected only in the beetle but not the fly (column 3; based on Fig. 2) because in this case, all phenotypes were confirmed by independent experiments with non-overlapping dsRNA fragments in different genetic backgrounds such that false positive results are excluded. In summary, while there are a number of uncertainties that we could not clarify with available data or methods, most of these uncertainties hint at underestimation rather than overestimation of functional divergence between fly and beetle.
Our work focused on developmental processes with different grades of assumed conservation and different grades of previous knowledge. Morphologically, the muscle pattern and general development appear to be a quite conserved between these insects [31, 49] compared to oogenesis where a number of morphological differences were described [40,41,42]. Given the background of a strongly derived head morphology of first instar larvae but conserved adult heads and brains, both conservation and divergence were found with respect to the genetic control [6, 36, 50, 51]. Likewise, dorso-ventral patterning is relying on both, conserved and diverged gene regulatory networks [14, 52, 53]. Taken together, our selection appears to cover both conserved and diverged processes such that—at least for the genetic control of development—our data can be generalized with some confidence.
Technical characteristics contribute to the detection of unequal gene sets
Our numbers reveal that functionally comparable gene sets in two quite closely related model systems are far from identical. A question of obvious biological relevance but not easily resolved is: to which degree do these differences reflect the biologically meaningful divergence of gene functions, or alternatively, simply result from technical problems, i.e., reflect different strengths and weaknesses of the respective screening methods and model systems?
As discussed above, some degree of false negative data may be expected in both model systems. In the case of the iBeetle screen, this will be restricted to technical false negative data. In the D. melanogaster field, there may be additional false negative data due to the lack of saturation of screens and/or lack of reporting of genes that were not studied in detail. However, given the extent and comprehensiveness of work in the D. melanogaster field, we feel that this might not be of high relevance. As to the different strengths of screening procedures, it is certainly true that the way screens are performed influences what sets of genes can be detected. For instance, our parental RNAi approach knocked down both, maternal and zygotic contributions while some classic D. melanogaster screens affected only the zygotic contribution. Hence, genes where maternal contribution rescues the embryonic phenotype are easily missed in the fly but not the beetle. For instance, parental RNAi knocking down components of the aPKC complex leads to severe early disruption of embryogenesis in T. castaneum while in respective D. melanogaster mutants almost no defects are seen on the cuticle level (A. Wodarz, unpublished observation). Conversely, our RNAi screen depended on the accuracy of gene annotations and our approach of screening for several processes in parallel may have reduced detection sensitivity. One striking example of the different strengths of screening designs is provided by wing blister phenotypes. In the first part of the iBeetle screen, we detected 34 genes showing wing blister phenotypes where 14 did not have related GO term annotation at FlyBase and 5 did not have any GO annotation at all. Seven of these genes were subsequently tested by RNAi lines in D. melanogaster where four of them indeed showed a related phenotype. Likewise, some wing blister genes from D. melanogaster were not annotated in the iBeetle screen. When we checked more specifically, this was often due to the lethality of the animal before the formation of wings [17]. When we varied the timing of injection, two of those knock-downs elicited wing blister phenotypes also in T. castaneum [17]. These data show that details of the screening procedure influence the subset of genes that are detected.
Evolutionary divergence of gene function and derived Drosophila biology may be larger than appreciated
Most relevant for the field of functional genetics is our conclusion that the degree of divergence of gene functions among holometabolous insects is larger than previously assumed. Therefore, some genes are detected only in one species because the gene’s function is not required for that process in the other. This finding should of course influence our thinking about using any insect as model for human development and diseases such as muscle fomation and congenital heart defect.
Indeed, there is evidence supporting the notion of an unexpected degree of divergence with respect to muscle development. Based on the iBeetle screen, a number of muscle genes identified in the iBeetle screen were more closely investigated in D. melanogaster [31, 49]. Despite quite some efforts, the negative data for fly orthologs appeared to be true negative. For example, null mutations of one of the genes found in our beetle, nostrin, did not elicit a phenotype in D. melanogaster unless combined with a mutation of a related F-bar protein Cip4. Likewise, Rbm24 displays strong RNAi and mutant phenotypes in T. castaneum and vertebrates, respectively, but D. melanogaster is lacking an Rbm24 ortholog, and functional compensation by paralogs was suggested to occur during D. melanogaster muscle development. Other genes including kahuli and unc-76 are expressed in the D. melanogaster mesoderm but only showed very subtle somatic muscle phenotypes, if any, in Mef2-GAL4 driven RNAi experiments or with CRISPR/Cas9 induced mutations, respectively (see Materials & Methods). By contrast, their beetle counterparts had strong and penetrant phenotypes in single knock-downs (e.g. see Tc-unc-76 in Fig. 5E) [31, 49, 54]. These data suggest that the function of genes or their relative contribution to this biological process has changed significantly. They also indicate that the single gene view may be limited. Phenotypes depend on networks of interacting genes and this may allow for changes and replacements of individual components while the overall network structure is maintained. There are more striking examples of gene function changes. The gene germ cell-less was detected in the iBeetle screen to govern anterior-posterior axis formation in the beetle while in D. melanogaster it is required for the formation of the posterior germ-cells [51]. Also, the D. melanogaster textbook example of a developmental morphogen bicoid does not even exist in T. castaneum [5] and yet other genes were found to act as anterior determinants in other flies [9, 10]. Along the same lines, the genes forkhead and buttonhead do not appear to be required for anterior patterning in T. castaneum but are essential in flies [12, 39, 55, 56].
These findings with respect to specific genes add to a number of observations arguing for a comparatively high degree of divergence due to the overall highly derived nature of fly biology. The number of genes is much smaller in D. melanogaster (appr. 14,000) compared to T.castaneum (appr. 16,500). Further, a number of developmental processes are represented in a more insect-typical way in T. castaneum like for instance segmentation [57], head [50] and leg development, brain development [58], extraembryonic tissue movements [59], and mode of metamorphosis [60]. In most cases, the situation in the fly is simplified and appears to be streamlined for faster development. We think that these biological differences might be the basis for divergence in gene function, which we just started to uncover. In the absence of similar large-scale comparisons in other species, it remains open, whether an insect-typical gene set even exists or whether one would rather have to emphasize a constant change of gene function, such that any ancestral gene set simply “melts” away with evolutionary time.