Large-scale data analysis for robotic yeast one-hybrid platforms and multi-disciplinary studies using GateMultiplex

Background Yeast one-hybrid (Y1H) is a common technique for identifying DNA-protein interactions, and robotic platforms have been developed for high-throughput analyses to unravel the gene regulatory networks in many organisms. Use of these high-throughput techniques has led to the generation of increasingly large datasets, and several software packages have been developed to analyze such data. We previously established the currently most efficient Y1H system, meiosis-directed Y1H; however, the available software tools were not designed for processing the additional parameters suggested by meiosis-directed Y1H to avoid false positives and required programming skills for operation. Results We developed a new tool named GateMultiplex with high computing performance using C++. GateMultiplex incorporated a graphical user interface (GUI), which allows the operation without any programming skills. Flexible parameter options were designed for multiple experimental purposes to enable the application of GateMultiplex even beyond Y1H platforms. We further demonstrated the data analysis from other three fields using GateMultiplex, the identification of lead compounds in preclinical cancer drug discovery, the crop line selection in precision agriculture, and the ocean pollution detection from deep-sea fishery. Conclusions The user-friendly GUI, fast C++ computing speed, flexible parameter setting, and applicability of GateMultiplex facilitate the feasibility of large-scale data analysis in life science fields. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-021-01140-y.

. Result file format. a The identified and extracted input data is shown as a sample name column (with green, yellow, grey and blue cells), a treatment column (in red) and a signal column (in brown). b After GM analysis, the results would be integrated and output as a result file. Each signal would be processed into a positive or negative result (P/N in brown) in the result column, and the treatment would be the title of the result column. The sample names were listed beside the P/N results.
The reference (blue) would not obtain a P/N result.

Figure S8. SampleName, Treatment and Signal of single-dose screening in drug development. a
A schematic 96-well plate seeded with cell suspension (grey wells) was defined as a "cells plate". b The cells were incubated with various reagents in a compound plate (compound plate-1), including 1 reference ("R" represented the solvent control and were shown in blue) and 9 different compounds (compound A1 to A9). The different reagents were defined as "SampleName". The compound plate-1 was defined as "Treatment". The relative viability of cells in each well would further be detected and quantified into corresponding values (c), and these values circled by purple dashed frame were defined as "Signal". d In single-dose screening, the numbers of compound plates could be multiple, such as compound plate-1 (compound A1 to A9, red to pink), compound plate-2 (compound B1 to B9, dark green to light green), and compound plate-3 (compound C1 to C9, brown to light yellow). The compound plate-1, compound plate-2 and compound plate-3 were defined as "Treatment". Figure S9. Cutoff setting of single-dose screening in drug development. a In single-dose screening, the reference or each compound were conducted with 6 technical replicates (tech-rep#1 to tech-rep#6, circled in the grey dashed frame). The schematic values from relative cell viability were highlighted in yellow background. The values circled by a blue frame were from the reference. Two compounds, compound A1 (in red dashed frame) and compound A8 (in pink dashed frame), were used as the examples to demonstrate the following cutoff setting. b In reference cutoff, the 6 reference values (in blue dashed frame) were ranked from the highest to the lowest. The reference values ranged from 2 nd to 5 th (in blue background) were averaged. Such averaged value of reference was normalized to 1, and the fold change cutoff was set as 0.6-fold lower than the reference. If the relative fold change value from one technical replicate of a compound is lower than the cutoff value 0.6, then this technical replicate would be regarded as a positive. If not, then this technical replicate would be regarded as a negative. The compound A1 (in red) showed 5 positives and 1 negative (5P1N), while the compound A8 (in pink) exhibited 2 positives and 4 negatives (2P4N). c The tech-rep cutoff was set as higher than or equal to 4. In this case, the compound A1 included 5 positives (P, positive), which was greater than the tech-rep cutoff and would be regarded as a positive. The cells were seeded in a 96-well plate, (b) and then were incubated with the reference (R, solvent control, blue) and two compounds (compound A1, red; compound A9, pink). The compound-treated experiments were performed in serial dosages from low (Dose-1) to high (Dose-5). The compound dosages and the reference were defined as "SampleName". The names of compounds, such as compound A1 and compound A9, were defined as "Treatment". The relative viability of cells in each well would further be detected into schematic corresponding values (c), and these values circled by purple dashed frame were defined as "Signal". d The numbers of compounds increased with the plate numbers (plate-1 to plate-3), and the names of compounds were defined as "Treatment" (compound A1, compound A9, compound B5, compound B9, compound C7, compound B3).  group (in blue) and 2 compound groups. The reference (blue) and different names of compounds (brown, yellow, dark green, light green, red, and pink) were defined as "SampleName". The different plates, including plate-1, plate-2 and plate-3, were defined as "Treatment". b The corresponding schematic values in the reference group and compound groups were defines as "Signal" (in grey background). averaged. The averaged value was normalized to 1, and the fold change cutoff was set as 0.6-fold lower than the reference. If the relative value of compound-treated group is lower than the cutoff value 0.6, then the compound-treated group would be regarded as a positive. In the case of compound A1, two technical replicates in each biological replicates (bio-rep#1, bio-rep#2 and bio-rep#3) were all regarded as positives (P, positive). c The tech-rep cutoff was set as higher than or equal to 2. For compound A1, each biological replicate showed 2 positive technical replicates, and then each biological replicate was regarded as a positive (P, positive). d The bio-rep cutoff was set as higher than or equal to 3. The compound A1 with 3 positive biological replicates passed the bio-rep cutoff.  experimental lines (#1 to #9) and 1 reference line (#10), was used to demonstrate the line selection.
The names of plant lines were defined as "SampleName". b The biomass data of plant lines were collected on each day. The dates (from Date-01 to Date-N) of biomass data were defined as "Treatment". The biomass values of plant lines were defined as "Signal". c In reference cutoff, the biomass values of two plant lines (#4 in orange and #7 in red) in Date-N were used as examples. The biomass value of the reference line (R, blue) was normalized as 1 (blue bars), and the fold change cutoff was set as 1.25-fold higher than the reference. The relative biomass fold-change value of line#4 was lower than the cutoff value 1.25, and then defined as a negative. The relative biomass fold-change value of line#7 was higher than the cutoff value 1.25, and then defined as a positive.   The location of a grid could be represented by a set of latitude and longitude. For example, the coordinate of the purple grid could be represented by the latitude coordinate as -1045 and the longitude coordinate as 9058. c, d The vessel or fishing hour of vessel activity were collected and recorded on each day. In a grid, take dark blue grid as an example, the vessel activities would be recorded as different data sets if the ships belonged to the different countries or equipped with different gear types. For example, the light blue and dark brown dashed frame represented two ships both belonged to the Country-A (pink flags) but equipped with different gear types, then the vessel activities of these two ships would be recorded as two data sets. If the ships were from the same country and equipped with the same gear type in a grid, the recorded vessel activities would be the sum from the ships. In red dashed frame, the 2 ships on the same grid were both from Country-B and equipped with type-1 gear type, so the vessel activities of both ships would be recorded into one data set. Figure S19. SampleName, Treatment, Signal and cutoff setting of marine ecosystem data. a The "SampleName" is composed of gear type, country, latitude and longitude of marine ecosystem data. b The dates of marine ecosystem data were defined as "Treatment". The vessel activity, here as the vessel hour, was defined as "Signal". The SampleName combination of orange flag (Country-B), white boat (gear type-1), -1044 (latitude) and 9059 (longitude) was selected as the reference (in blue background). c In reference cutoff, the vessel hour value of the reference was normalized to 1 and the fold-change cutoff setting was set as 1-fold greater than that the reference. Take the schematic data on 2011/01/01 as an example, the relative vessel hour value of orange flag (Country-B) with white boat (gear type-1) on the grid (-1045, 9060) was higher than the reference cutoff value 1, and then defined as a positive. Here the DNA-bait and culturing days are defined as different "Treatment". One "Treatment" can contain different conditions. For example, Day0, Day1 and Day2 all belongs to "Treatment-2". Day0 is "Condition#1", Day1 is "Condition#2" and Day2 is "Condition#3".  Figure S20, the yeast cells can be examined in three different experimental methods, including Haploid (colored in yellow), Diploid (colored in green) and Meiosis (colored in blue). These experimental methods belong to "Treatment-3", and the Haploid, Diploid and Meiosis are "Condition#1", "Condition#2" and "Condition#3", respectively. The output result file contained one result column (brown) with the P/N results, and the treatment (red) was the title of the result column. c The input file included two treatment columns, "Treatment-1" (red) and "Treatment-2" (purple). d In the output result file, the treatment would be combined as the title for the result column (half red and half purple). e The input file contained two treatment columns, "Treatment-1" and "Treatment-2". "Treatment-1" contained two conditions, "Condition#1" (red) and "Condition#2" (pink). "Treatment-2" contained one condition, "Condition#1" (purple). f In the output result file, the combinations of the conditions from each treatment column would be used as the titles for the result columns. For example, the "Condition#1" of "Treatment-1" (red) was combined with the "Condition#1" of "Treatment-2" (purple). Such combination represented one of the result column title (half red and half purple) in the result file. g The input file, a more complicated situation, contained multiple treatments each with multiple conditions. In the input file, "Treatment-1" (red and pink) and "Treatment-2" (dark purple and light purple) both had two conditions. h The conditions from each treatment column would be combined into four combinations as the titles of four result columns. including the input file information, the cutoff setting and the output file selection. The blue words represent additional cutoffs and options compared to the GUI of GM_Basic. The input file information area can be used to identify "SampleName", "Treatment", "Signal" and internal control (additional). The cutoff setting area contains background noise cutoff, reference cutoff, internal control cutoff (additional), bio-replicate cutoff, tech-replicate cutoff and positive cutoff (additional).
In reference cutoff, besides the manual option, two additional options were added, including the percentile option and the fixed value option. The output file selection area provides the result file, the fold change file and the PNE file (additional). After inputting all required information into GUI, pressing the "GO!" button would start the data analyzing process. Figure S24. Internal control cutoff. Two kinds of plates were used for culturing yeast colonies, including the non-antibiotic plates and the antibiotic plates. The non-antibiotic plates were used as the internal control plates. All yeast colonies should be able to grow on the internal control plates. The antibiotic plates were used as selection plates. Each plate contains 24 colonies. Grown colonies are shown as black dots and colonies without growing are shown as empties. Colonies on relative positions between the internal control plates and the selection plate contain the same TF-DNA combination. a, b If a colony can grow on the internal control plates (green dashed frame and blue dashed frame), then (a) the growth of the colony on the selection plate would be regarded as a positive (green dashed frame) and (b) the colony without growing would be regarded as a negative (blue dashed frame). c The colonies without growing on the internal control plate (red frames) would be regarded as experimental errors during sample preparation. Upon this situation, the colonies with or without growing on the selection plate would both be counted as excluded (red frames).  area. The lines, from line #1 to line #6, would be compared to a reference line to select the lines with required phenotypes. The plant lines with purple background were the selected lines for each trait. A plant line with more required phenotypes would be a better selection for breeding. In this case, positive cutoff of traits was set as 2. If a plant line is selected in more than or equal to 2 traits (purple background), the line would be the final selected line (in blue words and blue dashed frame). For example, the line #2 was the selected line in both height and leaf angle (in purple background for two times), and was the final selected line (in blue words and blue dashed frame).