Oligos and strains
The transposon-insertion mutants in the condensed collection are listed in Additional file 2: Dataset S1. The genes that are only represented by multi-insertion strains are listed in Additional file 3: Dataset S2. The strains and oligos used in this study are listed in Table S1. Details of the transposon design, transformation, and the original RB-TnSeq mapping experiment were published previously .
We used BHIS as the base medium for this study, to match growth conditions during the initial construction of the transposon pool . Our BHIS formulation is hydrated Brain Heart Infusion powder (Difco, Cat. #237200) supplemented with 0.2% (w/v) sodium bicarbonate (Fisher bioreagents, Cat. #BP328-500), 0.05% (w/v) porcine hemin (Alfa Aesar, Cat. #A11165), and, in some cases, 0.1% (w/v) L-cysteine (Thermo Scientific, Cat. #A10435.18). In a previous study, we found that exogenous cysteine led to the production of H2S by B. theta VPI-5482 . With the large volumes required for this approach, the H2S levels were enough to saturate our hydrogen sulfide scrubbing column and make working with the collection dangerous. Therefore, for all steps that required growth of the collection in substantial volumes, we did not include cysteine in the BHIS formulation. Cultures were incubated in a custom anaerobic chamber (Coy Laboratories) using an 85% nitrogen–10% carbon dioxide–5% hydrogen anaerobic gas mix (Praxair, Cat. #BI NICDHYC4-K).
Preparation of the pooled library for sorting of the progenitor collection using flow cytometry
A barcoded transposon mutant library of B. theta VPI-5482 mutants was sorted using a fluorescence-activated cell sorting (FACS) machine, as described previously  with a few modifications. Cells were sorted into 262 96-well plates in three batches of 60, 100, and 102 plates; the same protocol was used for each batch. These 262 plates were combined with 40 96-well plates that were generated in a pilot experiment , leading to the final progenitor collection of 302 96-well plates. All media and plasticware were pre-reduced in an anaerobic chamber (Coy Laboratories) for 3 days before use to eliminate any residual oxygen.
Addition of medium to the 96-well plates before sorting
Three hundred thirty three microliters of pre-reduced and filter-sterilized BHIS (with no added cysteine) were added to 2-mL 96-deep well plates (Greiner Bio-One, Cat. #780280) using a semi-automated BenchSmart pipettor (Mettler-Toledo) installed in an anaerobic chamber. Each 96-deep well plate with added medium was sealed with a foil seal (NuncTM Sealing Tapes, Fisher Scientific, Cat. # 232698) and stored anaerobically for 16–24 h at 37 °C before use in the next step of the procedure.
Outgrowth of the randomly barcoded transposon mutant library
A cryostock of the pooled library (1 mL of OD600=1) was thawed in an anaerobic chamber and added to 100 mL of BHIS  in a 250-mL Erlenmeyer flask that had been pre-warmed to 37 °C, and the culture was incubated overnight (12–18 h) at 37 °C without agitation. The next day, ~3 h before cultures were transported to the FACS machine, this overnight culture was diluted to an OD600~0.1 in 4 mL of BHIS pre-warmed to 37 °C in 4 independent samples. At the same time, BHIS was pre-aliquoted into an additional set of tubes (2 mL per tube) and kept at 37 °C.
Sorting the pooled library to create the progenitor collection
Immediately before being transported to the FACS machine, log-phase cultures of the transposon library were diluted to an OD600 of 0.01–0.05 in pre-warmed BHIS with 10 mM cysteine in a FACS tube (Falcon® high-clarity polypropylene round bottom test tubes, Corning, Cat. #352063) and mixed thoroughly in the anaerobic chamber. An initial culture in a FACS tube was brought out of the chamber and loaded onto the FACS machine to calibrate the instrument and define a gate based on forward and side scatter. After calibrating the machine, a fresh FACS tube of culture and a set of pre-reduced 96-deep well plates were brought to the FACS machine and the fresh culture was used to sort single cells. To preserve the anaerobic environment of the culture, the FACS tube was sealed in an airtight container inside the anaerobic chamber for transportation to the FACS machine, and the FACS tube was disturbed as little as possible after being exposed to oxygen. After single cells were sorted into individual wells of a 96-deep well plate, the deep well plate was lightly resealed with a gas-permeable seal (Excel Scientific Inc., Cat. #BS-25). Batches of 15 96-deep well plates with sorted cells were returned to the anaerobic chamber and new sets of deep well plates with fresh media were transported to the FACS facility as needed to keep the FACS machine in continual operation. Once sorted plates were returned to the anaerobic chamber, the gas-permeable seal was fully sealed using a rubber brayer roller, taking care to seal the edges of the plate to prevent evaporation. The sorted plates were then returned to the 37 °C incubator in the anaerobic chamber. The FACS tube of culture was replaced with a fresh culture every 30 plates. Cultures of the transposon library were kept in log-phase via dilution in the 37 °C incubator throughout the course of the sorting experiment to maintain a supply of log-phase cells for sorting.
Aliquoting copies of cryostocks of the progenitor collection
The sorted cells were allowed to grow into monocultures over 2 days. Glycerol was added to the cultures, and the glycerol stock was aliquoted into two copies of the library using a semi-automated 96-well BenchSmart pipetting robot inside the anaerobic chamber. Eighty microliters of 50% glycerol was added to each well (final concentration 15% glycerol) and mixed by pipetting up and down. Eighty microliters of the glycerol stock were then aliquoted into two V-bottom 96-well plates (Greiner Bio-One, Cat. #651161). The cryostock plates were sealed with a foil seal and stored at −80 °C. The remainder of the glycerol stock inside the 96-deep well plate was also stored for subsequent pooling steps of the protocol.
Selection criteria for re-arraying the collection
Only insertions in the middle 85% of genes (positioned after the first 5% and before the last 10%) were considered eligible for re-arraying into the final collection. Multi-insertion strains were not eligible for re-arraying. When more than one single-insertion strain covered the same gene in the progenitor collection, the insertion closest to the middle of the open reading frame was prioritized.
Re-arraying the progenitor collection
At the beginning of the experiment, erythromycin (Sigma, Cat. #E5389-5G) was added at a final concentration of 10 μg/mL to a bottle of filter-sterilized BHIS (Becton Dickinson) without cysteine that had been left in the anaerobic chamber for 48–72 h to reduce. This selective medium was aliquoted into 8 96-deep well plates (Celltreat, Cat. #229574) using a BenchSmart with a 1000-μL head (Rainin, Cat. #BST-96-1000), sterile filter tips (Rainin, Cat. #30296782), and a 300-mL reservoir (Integra, Cat. #6328). The 96-deep well plates were sealed with a plastic film (Excel Scientific, Cat. #STR-SEAL-PLT), transferred to a second anaerobic chamber using airtight plastic boxes, and stored in a 37 °C incubator inside the anaerobic chamber until inoculation.
Over the course of a day, sections of the progenitor collection were re-arrayed anaerobically using an Eppendorf EpMotion 5073 (Eppendorf, Germany) running EpBlue v. 22.214.171.124 and housed inside an anaerobic chamber. Table S2 contains the settings used for the EpMotion.
Before transfer to the anaerobic chamber, the 96-well V-bottom plates containing glycerol stocks of the progenitor collection were removed from the −80 °C freezer in batches and stored on a bed of dry ice. The lids of each plate were removed, ice was cleaned off, and the lids were sterilized with 70% (v/v) ethanol while the sealed plates were centrifuged in a microplate centrifuge (Fisherbrand, Cat. #14-955-300) for 30 s. After centrifugation, the foil seal was removed from each plate, taking care not to jostle the plate. The clean and sterilized lid was then returned to each plate, and the lidded plates were returned to the bed of dry ice. When the entire batch of plates had been processed in this manner, the plates were transferred into the anaerobic chamber and stored in a safe location on the benchtop.
Next, we inoculated the 96-deep well plates of selective medium with 40 μL of glycerol stock from selected wells of the progenitor collection. Three hundred microliter PCR-clean filter tips (Eppendorf, Cat. #0030 014.472) were used in combination with a single channel 300-μL adaptor (Eppendorf) to transfer glycerol stocks. The pipetting pattern (the set of instructions connecting position in the progenitor collection to position in the condensed collection) was imported into EPBlue as a .csv file after being generated using a Matlab script.
After inoculation, 96-well plates from the progenitor collection were resealed with a foil seal (Thermo Scientific, Cat. #232699) and transferred back to the −80 °C freezer. The 96-deep well plates with freshly inoculated cultures comprising the condensed collection were sealed with a gas-permeable seal (Excel Scientific Inc., Cat. #BS-25) and stored in a 37 °C incubator in the anaerobic chamber to recover for 36-48 h and used to inoculate growth curves and to aliquot copies for cryo-storage.
Growth curve inoculation
Approximately 48 h post-inoculation, deep well cultures were used to inoculate fresh medium for growth curve measurements and then aliquoted into glycerol stocks as copies of the final condensed collection (glycerol stock storage described below).
First, 198 μL of BHIS without cysteine and without erythromycin was aliquoted across 16 96-well flat-bottom plates (Greiner Bio-One, Cat. #655180) using a BenchSmart with a P1000 head and sterile filter tips. The plates were transferred to the anaerobic chamber along with the previously generated cultures using an airtight sealed container.
The cultures were then used to inoculate fresh medium for growth curves using an EpMotion 5073. Fifty-microliter PCR-clean filter tips (Eppendorf, Cat. #0030 014.430) were used in combination with an 8-channel 50-μL volume adaptor (Eppendorf). Each deep-well culture was used to inoculate 2 96-well flat-bottom plates as replicates for the growth curve measurements. Two microliters of culture were transferred without mixing at the source and with 1 mixing step of 40 μL at the target. The same tips were used, and the source was revisited once, to inoculate a replicate target plate. Table S3 contains the machine settings used for this protocol. To avoid transferring liquid from the intentionally blank wells on each plate, we removed the tips from positions A1, B1, and the other blank well on the plate (Additional file 2: Dataset S1).
For some of the flat-bottom growth curve plates, 2 μL of a culture of wild-type B. theta VPI-5482 grown in BHIS without cysteine for 36–48 h were used to inoculate position A1 as a positive control. All flat-bottom 96-well plates were sealed with modified sterile plastic seals, cut to not extend over the edges of the plates, and assembled in a plate stacker (BIOSTACK3WR, Biotek Instruments Inc.) associated with a Synergy H1 microplate reader (Biotek Instruments Inc.) running Gen 5 v. 3.08.01. The plate stacker and the front of the microplate reader were enclosed in a custom-fabricated box along with a thermal control unit (AirTherm SMT, World Precision Instruments) to ensure a constant temperature of 37 °C during growth curve measurements. The plate stacker constantly read plates and one complete run through all plates required 30–42 min depending on the number of plates. The plate reader settings were as follows: 37 °C, 10 s of shaking at 282 cycles per min with a double orbital pattern before reading optical density at 600 nm. After approximately 48 h of growth, the plates were removed from the plate stacker and used for single-cell imaging.
Storing glycerol stocks
After being used to inoculate flat-bottom plates for growth curves, the deep well cultures were sealed with a plastic film (Excel Scientific, Cat. #STR-SEAL-PLT) and transferred back to an anaerobic chamber using an airtight box. The BenchSmart 96-well pipetting robot was used along with a P1000 head and a 300-mL reservoir to transfer 353 μL of a sterile solution of 50% glycerol (Fisher, Cat. #G33-4) and 50 mM cysteine (Millipore, Cat. #243005-100GM) that had been pre-reduced inside the chamber for >48 h. After mixing the cultures with glycerol by pipetting up and down twice, the glycerol stocks were distributed in 80-μL aliquots into 96-well V-bottom plates covered temporarily with a sterile lid (Greiner, Cat. #656161) as copies of the final condensed library. Aliquoted library copies were sealed with a foil seal, the sterile lid was placed over the seal, and plates were transferred to a −80 °C freezer for long-term storage.
Wells in the progenitor collection (1st round) and ordered collection (2nd round, quality check) were pooled according to the same plate-well strategy. A plate-well pooling strategy requires N+96 pools, where N is the number of 96-well plates. Pooling essentially followed the procedure described previously . Individual wells were first pooled, either pooling the same well from all plates (e.g., A1 from progenitor collection plates 41–302) or pooling all wells from a single plate (e.g., A1–H12 from plate 41). In previous efforts , the first pool set (same well, different plates) was pooled further to create 8 and 12 row and column pools, respectively. In this work, the set of pools from the 96-well positions were sequenced directly, along with the 262 plate pools. As described previously , a single pool was made from every well of the progenitor collection extension (plates 41–302) for use as input to RB-TnSeq.
With the plate-well pooling strategy used here, the location of a barcode isolated n times in the collection will be narrowed down to n2 possible wells. For example, a barcode isolated at position G1 of plate 1 (P1-G1) and position H2 of plate 2 (P2-H2) of the progenitor collection will be sequenced in pools P1, P2, G1, and H2. The four possible locations that are consistent with these results are P1-G1, P1-H2, P2-G1, and P2-H2. Two of the possible locations in this example are the true locations, while the remaining two are artifacts of the pooling and decoding process.
When a barcode did not have a definite location, we used a probabilistic strategy to predict the likelihood of a particular configuration of wells, as described previously [30, 53]. Critically, this algorithm depends on systematic differences in the contribution of each well in the pool to the total number of sequencing reads . While BarSeq is particularly effective compared to other methods at providing a quantitative and accurate estimate of the relative abundance of a barcode in the pool , plate-well pools carrying similar relative abundance of the barcode in question are poor in information and hence the resulting predictions are low in confidence. Here, we used a heuristic cutoff of 0.85 for considering a predicted configuration of locations to be high confidence. If the probability was <0.85, all possible wells from one plate were transferred to the condensed collection, and the true mutant was identified in the subsequent quality check sequencing run. With the plate-well pooling strategy used here, barcodes isolated n times with ambiguous locations resulted in n wells being transferred to the condensed collection, only one of which contained the correct mutant.
Sequencing the progenitor collection
DNA from plate-well pools of the progenitor collection was isolated with a DNeasy 96 Blood &Tissue Kit (Qiagen, Cat. #69582) according to the manufacturer’s instructions. BarSeq was performed on individual pools with indexed primers (Table S1), as described previously , and sequenced on a MiSeq (Illumina, SY-410-1003) using MiSeq Reagent Kit v3 (150-cycle) (Illumina, MS-102-3001). DNA was isolated from the complete pool at the same time and used as input for an RB-TnSeq protocol, as detailed previously  (Table S1). The RB-TnSeq library was sequenced on a MiSeq using MiSeq Reagent Kit v3 (150-cycle).
Decoding the progenitor collection
The process of locating barcodes within the progenitor collection was performed essentially as previously described for the initial 40-plate collection , with small modifications to account for the change in pooling strategy from row-column-plate to plate-well. Briefly, we used the BarSeq results from individual pools to locate barcodes within the collection. Barcodes with definite solutions (isolated once in the collection) were identified first, then statistics on the distribution of abundance of these barcodes were used to inform the likelihood of solutions for the location of barcodes without definite solutions (isolated more than once in the collection).
Simultaneously, we incorporated the RB-TnSeq data of the progenitor collection into the larger RB-TnSeq dataset from the initial pooled library. The higher depth of sequencing from RB-TnSeq on the progenitor collection allowed us to map more barcodes to the genome and provided higher sensitivity for the detection of multiple insertion sites associated with the same barcode. Once a barcode was located in the collection, a lookup table connecting barcodes to insertion sites was used to determine its utility as a mutant strain for the condensed collection.
The detailed algorithm for determining the mapping status of a barcode (e.g., single-insertion versus multiple/ambiguous insertion) can be found in previously published code . Importantly, we only considered insertion locations in the genome for which the number of reads was >25% that of the most abundant insertion location for the same barcode. While the same barcode mapping to multiple locations in the genome in a pooled library could arise from multiple causes (such as the chance occurrence of the same barcode in multiple strains), an RB-TnSeq dataset from the progenitor collection alone indicated that the majority of barcodes that mapped to multiple sites were isolated only once, consistent with previous results . If the insertions of these ambiguous barcodes were found in separate strains, this scenario would require the repeated sorting of two or more cells with the same barcode into the same well. Therefore, in this study we treated barcodes associated with more than one insertion site as multiple-insertion strains.
Modeling assembly of the progenitor collection
To quantify barcode abundance in the initial pool, a 1-mL aliquot of the same initial pool as the one used for sorting the additional 262-plates was inoculated into BHIS and recovered overnight at 37 °C. Six 1-mL aliquots of this overnight culture were pelleted, DNA was extracted, and BarSeq was performed as above (see “Sequencing the progenitor collection”). This protocol is the same as the one used to generate the t0 samples that serve as controls for pooled fitness assays, and we expect that any t0 sequencing data will be useful for modeling collection assembly as long as it comes from the same initial pool as the culture used for sorting and was recovered for a similar period of time. The relative barcode abundances were averaged across these t0 samples and used as a probabilistic weight for random sampling during simulations. Before drawing from the pool, we filtered out barcodes in the t0 samples that were not associated with any insertion location (unmapped).
We started with a quantification of strain abundance in the initial pool used to sort the library using BarSeq. In our simulation, we used a Monte Carlo approach (repeated random sampling, weighted by relative abundance)  to simulate the isolation of barcodes from the initial pool and calculated the genome coverage (number of genes represented by ≥1 insertions) across a range of progenitor collection size. Barcodes were randomly drawn with replacement from an initial set defined by the initial pool, insertion sites were located, and genome coverage was determined. We required that an insertion be found between the first 5% and last 10% of a gene to consider that gene disrupted. We simulated a range of collection sizes (b total barcodes) and performed 250 simulations for each collection size. To account for assembly efficiency (K), we scaled the collection size in the simulated coverage curves by K−1. To assess the impact of strain abundance bias, we simulated collection assembly from an initial pool in which the weights were set to be equal. To assess the impact of assembly efficiency, we set K=1.
The value of K was estimated by quantifying two parameters from the statistics of the 262-plate library: bpw, the number of barcode bins per well, and fsingle, the fraction of barcode bins associated with a single site. These parameters were chosen because they represent the filtering steps used in this work to determine whether a barcode was useful for inclusion in the final condensed collection. We expect that K can be estimated for any protocol, as long as the fraction of wells with a useful barcode is accurately quantified.
Measurement of population growth metrics
Maximum growth rate was calculated as the largest slope of ln(OD) with respect to time (calculated from a linear regression of a sliding window of 5 time points) using custom Matlab (Mathworks, Natick, MA, USA) code.
Stationary-phase cells or cells from cryostocks were diluted 1:10 into 0.85X PBS and then taken from 96-well plates and placed on 1% agarose pads with 0.85X PBS to control for osmolality. Phase-contrast images were acquired with a Ti-E inverted microscope (Nikon Instruments) using a 100X (NA 1.40) oil immersion objective and a Neo 5.5 sCMOS camera (Andor Technology). Images were acquired using μManager v.1.4 . High-throughput imaging was accomplished using SLIP, as described previously . Including sample preparation and calibration, SLIP enables acquisition of 49 images per well of a 96-well plate in ~30 min. Since replicate growth curves appeared similar across the entire library (Additional file 4: Fig. S2), we imaged one replicate culture for each strain.
The MATLAB image processing code Morphometrics  was used to segment cells from phase-contrast or fluorescence microscopy images. A local coordinate system was generated for each cell outline using a method adapted from MicrobeTracker . Cell widths were calculated by averaging the distances between contour points perpendicular to the cell midline, excluding contour points within the poles and sites of septation. Cell length was calculated as the length of the midline from pole to pole. Cell surface area was estimated from the local meshing.
Growth curve measurements from single colonies
To isolate single colonies, we aerobically struck glycerol stocks onto BHIS+1.5% agar plates and then transferred the plates to an anaerobic chamber and incubated them at 37 °C for 48 h. We performed all further steps in an anaerobic chamber. We inoculated single colonies into a pre-reduced and pre-blanked flat-bottom plate (Greiner Bio-One, Cat. #655180) with 200 μL of pre-reduced BHIS and into a 96-deep well plate (Greiner Bio-One, Cat. #780280) with 500 μL of pre-reduced BHIS, and incubated the cultures for 48 h. The flat-bottom plate was used to measure outgrowth from the colony and the deep well plate was incubated without shaking in a 37 °C incubator. Since BT0870 colonies were visible but much smaller than wild type, we combined 5–6 colonies of the BT0870 mutant into one culture to approximately normalize the inoculum density. Two microliters of deep well cultures was used to inoculate a pre-reduced and pre-blanked flat-bottom plate with 200 μL of pre-reduced BHIS. For measurements of both outgrowth from a colony and from the 48 h cultures, we applied an optical seal (Excel Scientific, Cat. #STR-SEAL-PLT) and measured OD600 with a Biotek Epoch plate reader with the following settings: temperature 37 °C, reading of OD600 every 5 min with continual orbital shaking (3 mm, 282 cycles per min) between reads. We subtracted well-specific blank values before plotting the growth curves and calculating maximum growth rate .
PCR confirmation of double-insertion strains
We chose representative strains from the progenitor and ordered collections for confirmation of sequencing-based classification of either single-barcode double-insertion or double-barcode double-insertion strains. A common forward primer within the transposon was paired with an insertion-specific reverse primer, and a PCR check was performed to confirm the transposon-insertion site by amplifying across the transposon-insertion junction (Table S1). A sample from targeted wells in the collection was struck out on BHIS plates, eight individual colonies per strain were picked and grown up overnight, and the overnight cultures were used as input for colony PCR. If both insertion sites could be detected in most single colonies, the strain was considered as a confirmed multi-insertion strain.