Preparation and fixation of cell lines for Drop-seq
Human Flp-In T-Rex 293 HEK cells were a gift from M. Landthaler (Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin) originally obtained from Invitrogen (catalog no. R78007); murine NIH/3T3 cells were from DSMZ (ACC 59, DSMZ, Braunschweig, Germany). Cells were grown in Dulbecco’s modified Eagle’s medium (DMEM, 61965-026, Invitrogen, Waltham, MA, USA) without antibiotics containing 10% fetal bovine serum and confirmed to be mycoplasma-free (LookOut Mycoplasma PCR detection kit, Sigma-Aldrich, St. Louis, MO, USA). Cells were grown to 30–60% confluence, dissociated for 5 min at 37 °C with 0.05% bovine trypsin-EDTA (Invitrogen 25300062), quenched with growth medium and further processed as described previously (Macosko et al. [7, 13]). Briefly, between ~1 and 10 × 106 cells were handled always in the cold and kept on ice, pelleted at 300 × g for 5 min at 4 °C, washed with 1× phosphate-buffered saline (PBS) + 0.01% bovine serum albumin fraction V (BSA) (100 μg/ml; 01400, Biomol, Hamburg, Germany), resuspended in PBS, filtered through a 40- or 35-μm cell strainer and counted. For Drop-seq, a [1 + 1] mixture of [HEK + 3T3 cells] was prepared at a combined input concentration of 100 cells/μl in 1× PBS + 0.01% BSA (corresponding to a final concentration of 50 cells/μl after mixing with lysis buffer in the co-flow device).
Methanol fixation was adapted from Stoeckius et al. [12]. Cells were trypsinized, and between 1 and 4 × 106 cells were processed as described above for Drop-seq [7]. Cells were handled in regular (not ’low-binding’) microcentrifuge tubes to minimize cell loss and kept cold at all times. After straining and counting, cells were pelleted at 300 × g for 5 min at 4 °C, the supernatant was removed manually and the cell pellet resuspended in 2 volumes (200 μl) of ice-cold PBS. To avoid cell clumping, 8 volumes (800 μl) of methanol (grade p.a.; pre-chilled to –20 °C) were added dropwise, while gently mixing or vortexing the cell suspension (final concentration: 80% methanol in PBS). The methanol-fixed cells were kept on ice for a minimum of 15 min and then stored at –80 °C for up to several months, as indicated. For rehydration, cells were either kept on ice after fixation (Fixed) or moved from –80 °C to 4 °C (Fixed 1 or 3 weeks) and kept in the cold throughout the procedure. Cells were pelleted at 1000 to 3000 × g, resuspended in PBS + 0.01% BSA, centrifuged again, resuspended in PBS + 0.01% BSA, passed through a 40- or 35-μm cell strainer, counted and diluted for Drop-seq in PBS + 0.01% BSA as described above. For control of RNA quality after fixation, cells were resuspended in PBS, kept on ice for 5–10 min and repelleted; RNA was then extracted with TRIZOL.
Preparation of Drosophila cells for Drop-seq
The D. melanogaster strain used was y
1
w
1118
; P{st.2::Gal4}; P{vnd::dsRED} [14]. Eggs were collected on apple juice-agar plates for 2 h and aged for ~6 h at 25 °C. Embryos were dechorionated for 1 min in ~4% NaOCl (diluted commercial bleach) and extensively washed with deionized water. Excess liquid was removed, and embryos were transferred to 1 ml ice-cold dissociation buffer (cell culture grade PBS, 0.01% molecular biology grade BSA) 6 h after embryo collection. Approximately 500–1000 embryos were collected prior to dissociation; a small subsample was stored in methanol for later staging by microscopy. Embryos were dissociated in a Dounce homogenizer (Wheaton 357544) with gentle, short strokes of the loose pestle on ice until all embryos were disrupted. The suspension was transferred into a 1.5-ml microcentrifuge tube, and the cells were pelleted for 3 min at 1000 × g at 4 °C. The supernatant was exchanged for 1 ml fresh dissociation buffer. Cells were further dissociated using 20 gentle passes through a 22G x 2” needle mounted on a 5-ml syringe. The cell suspension was then gently passed through a 20-μm cell strainer (NY2002500, Merck, Darmstadt, Germany) into a fresh 1.5-ml microcentrifuge tube, and residual cells were washed from the strainer using a small amount of dissociation buffer. Cells were pelleted for 3 min at 1000 × g at 4 °C and resuspended in 100 μl fresh dissociation buffer and counted. Samples were fixed by adding 4 volumes of ice-cold 100% methanol (final concentration of 80% methanol in PBS) and thoroughly mixed with a micropipette. Cells were stored at –20 °C until use (for up to 2 weeks).
For Drop-seq runs, cells were moved to 4 °C and kept in the cold throughout the procedure. Cells were pelleted at 3000 × g for 5 min, rehydrated in PBS + 0.01% BSA in the presence of RNAse inhibitor (RiboLock 1U/μl), pelleted and resuspended again in the presence of RNAse inhibitor, passed through a 35-μm cell strainer, counted and finally diluted for Drop-seq into PBS + 0.01% BSA (final concentration of 50 cells/μl).
Preparation of mouse hindbrain cells for Drop-seq
Brains of newborn mouse pups (C57BL/6; postnatal day 5) were dissected in ice-cold buffer (120 mM NaCl, 8 mM KCl, 1.26 mM CaCl2, 1.5 mM MgCl2, 21 mM NaHCO3, 0.58 mM Na2HPO4 and 30 mM glucose, pH 7.4) that was saturated with 95% O2 and 5% CO2. Transsections at the level of the pons and C2 motor roots were performed using a razor blade to isolate the rhombencephalon. Hindbrain and cerebellar tissues were dissociated using the Papain Dissociation System (Worthington Biochemical, Lakewood, NJ, USA) according to the manufacturer's instructions. To facilitate dissociation and prevent aggregation, DNAse I (5U/ml, Roche, Basel, Switzerland) was added to the protease solution. After inactivation, cells were resuspended in Mg2+- and Ca2+-free Hank's Balanced Salt Solution. Live (propidium iodide-negative) cells were sorted directly into ice-cold methanol (final concentration 80% methanol), and the fixed cells were stored for more than 4 weeks at –80 °C. The sort was carried out under low pressure flow settings (23 psi; 100-μm nozzle), previously optimized to maximize recovery of viable cells for subcultures. For Drop-seq, aliquots with 106 (replicate 1) or 3 × 105 (replicate 2) sorted, methanol-fixed cells from three or two hindbrains, respectively, were pelleted and processed as described above. RNAse inhibitor (RiboLock 1 U/μl) was added during the rehydration, wash and cell straining steps. Cell recovery was 19% and 12% from the two cell preparations, respectively. Cells were diluted 1:3 into PBS-BSA 0.01% and then used for Drop-seq.
Drop-seq procedure, single-cell library generation and sequencing
Monodisperse droplets of about 1 nl in size were generated using microfluidic polydimethylsiloxane (PDMS) co-flow devices (Drop-seq chips, FlowJEM, Toronto, Canada; rendered hydrophobic by pre-treatment with Aquapel). Barcoded microparticles (Barcoded Beads SeqB, ChemGenes Corp., Wilmington, MA, USA) were prepared and, using a self-built Drop-seq setup, flowed in closely following the previously described instrument setup and procedures by Macosko et al. [7, 13]. For most microfluidic co-flow devices, emulsions were checked by microscopic inspection; typically less than 5% of bead-occupied droplets contained more than a single barcoded bead. Droplets were collected in one 50-ml Falcon tube for a run time of 12.5 min. Droplets were broken promptly after collection, and barcoded beads with captured transcriptomes were reverse transcribed without delay, then exonuclease-treated and further processed as described [7]. The first-strand cDNA was amplified (after assuming loss of about 50% of input beads) by equally distributing beads from one run to 24 or 48 PCR reactions (between 10 and 30 anticipated STAMPS per tube); 50 μl per PCR reaction; 4 + 9 cycles (except for mouse hindbrain replicate 2: 4 + 12 cycles). Then 20- or 10-μl fractions of each PCR reaction were pooled and double-purified with a 0.6× volume of Agencourt AMPure XP beads (catalog no. A63881, Beckman Coulter, Pasadena, CA, USA) and eluted in 12 μl H2O. We evaluated and quantified 1 μl of the amplified cDNA libraries on a BioAnalyzer High Sensitivity Chip (Agilent, Santa Clara, CA, USA). Then 600 pg of each cDNA library was fragmented and amplified (12 cycles) for sequencing with the Nextera XT v2 DNA Library Preparation kit (Illumina, San Diego, CA, USA) using custom primers that amplified only the 3' ends [7]. Libraries were purified with a 0.6× volume of AMPure XP beads followed by a 0.6–1× volume of AMPure beads to completely remove primer dimers and achieve an average length of ~500–700 bp, quantified and sequenced (paired end) on Illumina NextSeq 500 sequencers (library concentration: 1.8 pM; 1% PhiX spike-in for run quality control; NextSeq 500/550 High Output v2 kit (75 cycles); read 1: 20 bp (bases 1–12 cell barcode, bases 13–20 UMI; custom primer 1 Read1CustSeqB), index read: 8 bp, read 2 (paired end): 64 bp).
The unique identifiers are as follows: Live: GSM2359902; Fixed: GSM2359903; Fixed 1 week: GSM2359904; Fixed 3 weeks: GSM2359905; Drosophila: mel_rep1: GSM2518777; mel_rep2: GSM2518778; mel_rep3: GSM2518779; mel_rep4: GSM2518780; mel_rep5: GSM2518781; mel_rep6: GSM2518782; mel_rep7: GSM2518783; Mouse: mm_rep1: GSM2518784; mm_rep2: GSM2518785.
Single-cell RNA-seq: data processing, alignment and gene quantification
We chose read 1 to be 20 bp long to avoid reading into the poly(A) tail, leaving 64 bp for read 2. The sequencing quality was assessed by FastQC (v.0.11.2). The base qualities of read 1 were particularly inspected, since they contain the cell and molecular barcodes and their accuracy is critical for the subsequent analysis. The last base of read 1 consistently showed an increase in T content, possibly indicating errors during bead synthesis. We observed a similar trend when re-analyzing the original data from Macosko et al. [7]. Part of these errors were handled and corrected as described later. For read 2, we used the Drop-seq tools v.1.12 [7] to tag the sequences with their corresponding cell and molecular barcodes, to trim poly(A) stretches and potential SMART adapter contaminants and to filter out barcodes with low-quality bases.
For mixed species experiments with human and mouse cells, the reads were then aligned to a combined FASTA file of the hg38 and mm10 reference genomes, using STAR [15] v.2.4.0j with default parameters. Typically, around 65% of the reads were found to uniquely map to either of the species genomes. The Drosophila melanogaster sequences were mapped to the BDGP6 reference genome; typically around 85% of the reads mapped uniquely. For the mouse hindbrain samples, 75% of all sequence reads mapped uniquely. Non-uniquely mapped reads were discarded.
The Drop-seq toolkit [7] was further exploited to add gene annotation tags to the aligned reads (the annotation used was Ensembl release 84) and to identify and correct some of the bead synthesis errors described above. The number of cells (cell barcodes associated with single-cell transcriptomes) was determined by extracting the number of reads per cell, then plotting the cumulative distribution of reads against the cell barcodes ordered by descending number of reads and selecting the inflection point (‘knee’) of the distribution. It was similar to the number of single-cell transcriptomes expected during the Drop-seq run (see Additional file 1: Figure S1 for details). Finally, the DigitalExpression tool [7] was used to obtain the digital gene expression (DGE) matrix for each sample.
Exploratory analysis, visualization and filtering of Drop-seq data
We developed a freely available R software package ('dropbead'; including a tutorial), which offers an easy and systematic framework for exploratory data analysis and visualization. ’dropbead’ provides a function for computationally determining the inflection point and hence the number of cells in a sample. The starting point for subsequent analysis is the sample's DGE. ’dropbead’ provides functions for creating species separation plots and violin plots of genes and transcripts per cell. ’dropbead’ can be used to easily filter and remove genes with low counts and cells with few UMIs, or to keep the best cells according to a certain criterion. ’dropbead’ was used to generate Figs. 2a, b, c, 3a and 4a as well as Additional files 1, 2, 3 and 6: Figure S1a, b, f, Figure S2a, b, c, Figure S3a, b, Figure S5a, b.
We discarded cells from subsequent analysis which had fewer than 3500 UMIs (HEK and 3T3 cells), 1000 UMIs (Drosophila samples) or 300 UMIs (mouse samples). In the human-mouse mixed species experiments, a threshold of 90% (90 out of 100 UMIs for one species) was selected to confidently declare a cell as being either of the species and not a human/mouse doublet. In order to assess whether fixation generates ’low-quality cells’ [16], we determined the proportion of non-mitochondrial reads: for every cell we computed the sum of UMIs corresponding to RNA encoded by the mitochondrial genome and then subtracted this number from the sum of all UMIs in that cell. We divided this number by the total number of UMIs in the cell to obtain the non-mitochondrial content as a percentage for every sample.
Single-cell RNA-seq: normalization and correlations of gene expression levels
The raw counts in the DGE matrix were normalized to average transcripts per million (ATPM) as follows: the UMI counts for every gene in a given cell were divided by the sum of all UMIs in that cell. These counts were then multiplied by the sum of all UMIs of the cell with the highest number of UMIs in that library. Correlations of gene expression levels between single-cell samples were computed by first subsetting the DGE matrices of the two samples to the intersection of the genes captured in both libraries and then computing the sum of gene counts across all cells in each library. Plotting of correlations is shown in log space. For the correlation of Drop-seq data against mRNA-seq, we converted gene counts into reads per kilobase per million (RPKMs) and used the mean value of all isoform lengths for a given gene. For all correlations, the intersection (common set) of genes was high, around 17,000 genes for human and mouse samples (cell lines and primary cells) and 10,000 genes for D. melanogaster.
Single-cell RNA-seq: clustering and identification of cell populations and marker genes
For the Drosophila embryos and mouse hindbrain samples, after filtering our samples with ‘dropbead’ we used Seurat [17] for cluster analysis. We first identified a set of highly variable genes, which we used to perform principal component (PC) analysis. Judged by their statistical significance and the robustness of the results, the first few (about 20–50) PCs were subsequently used as inputs for clustering and subsequent t-distributed stochastic neighbour embedding (tSNE) representation. The clustering was performed with the function 'FindClusters' of Seurat using default parameters and 50 PCs (Drosophila) or 21 PCs (mouse) as input. The same number of PCs was used as input for the tSNE representation. We used Seurat’s function 'FindAllMarkers' to identify the marker genes for each of the clusters in the tSNE representation.
The initial clusterings for both the Drosophila embryos and the mouse hindbrain samples contained cell clusters which were difficult to characterize (three and one cluster, respectively). After closer inspection, we flagged two cell clusters in the Drosophila data as being nuclei, as cells were lacking substantial expression of mitochondrially encoded genes compared to the rest of the cells. We classified these cells as nuclei (probably generated by mechanic disruption in the cell isolation procedure) and excluded them from further analysis (2975 cells).
Furthermore, extrapolating from our mixed species experiments with human and mouse cell lines, we anticipated around 10–15% of same-species doublets for the Drosophila embryos and mouse hindbrain data sets. For Drosophila, we flagged a cluster of cells whose marker genes, as identified by Seurat, lacked specificity. For mouse, we flagged a similar cluster of cells which additionally contained higher portions of ribosomal protein coding mRNAs than the rest of the cells. We reasoned that both sets might be cell doublets and excluded them in order to perform the final cluster analysis shown in Figs. 3b and 4b (excluded cells: Drosophila, 2186 cells; mouse, 1127 cells).
Bulk mRNA-seq libraries
Live, cultured cells (Flp-In T-Rex 293 HEK cells, NIH/3T3 cells), intact, live Drosophila melanogaster embryos and sorted, methanol-fixed cells from dissected newborn mouse hindbrain and cerebellum were used for total RNA extraction with TRIZOL. Strand-specific cDNA libraries were generated according to the Illumina TruSeq protocol (TruSeq Stranded mRNA LT Sample Prep Kit, Illumina) using between 24 to 260 ng of total RNA input. The 1.8-pM libraries were sequenced on an Illumina NextSeq 500 System, using the High Output v2 Kit (150 cycles), single read: 150 bp, index read: 6 bp.
The unique identifiers are as follows: bulk_hek: GSM2518786; bulk_3t3: GSM2518787; bulk_mel1: GSM2518788; bulk_mel2: GSM2518789; bulk_mm: GSM2518790.