Zebrafish strain maintenance and larval sample preparation
The transgenic line Tg[HuC-H2B-GCaMP6s] with nacre or casper background was used for breeding. Embryos were kept in blue egg water (2.4 g of CaSO4, 4g of instant ocean salts, and 600μl of 1% methylene blue in 20 l of milliQ water) and incubated at 28°C. On 6 days post-fertilization (dpf), healthy larvae with high GCaMP6s expression were selected for imaging. Fish samples are held in custom-designed polydimethylsiloxane (PDMS) sample holders with each holder carrying up to 5 larvae. Each larva was half-embedded in a slot on the sample holder, paralyzed with 1mg/mL mivacurium chloride, and covered with 2% agarose gel/E3 medium solution. After loading the whole group of larvae to be imaged, the sample holders were immersed in E3 medium for 1 h to wash off traces of mivacurium chloride. All animal experiments were approved by the Institutional Animal Care and Use Committee (IACUC) at the University of California, San Francisco, USA.
In vivo calcium imaging using iSPIM
An inverted SPIM (iSPIM) that is similar to a previously reported design [41] was used for imaging. The microscope framework was adapted from a di-SPIM (Applied Scientific Imaging, Inc.) [64]. The excitation laser (Coherent OBIS LS 488 nm) was fiber-coupled into the microscope, collimated, then focused by a 0.3 NA water-dipping objective (Nikon). A virtual light-sheet was created by deflecting the scanning mirror, illuminating a layer of specimen with ~7-μm thickness. Fluorescence from the illuminated layer was collected in an orthogonal direction by a 20x 1.0 NA water-dipping objective (Olympus XLUMPLFLN-W), and the final image is captured by a scientific CMOS (sCMOS) camera (Hamamatsu Orca Flash 4.0, C11440). Due to the strong scattering of the blue-green light in deep tissues, we confined the illumination within dorsal forebrain of the larvae. The volume of interest was approximately 300 μm x 250 μm x 100μm in x,y,z directions, respectively. This volume covered the entire dorsal telencephalon and habenula regions. To resolve the dynamics of GCaMP6s, image stacks were acquired at 2Hz, which allowed us to resolve frequency components up to 1Hz based on Nyquist sampling theorem. Each 100-μm stack consisted of 26 slices with 4 μm between two adjacent slices. The resting state of the selected volume in each larva was imaged for 15 min; in addition to this time lapse recording, a Z-stack with 1-μm step (101 slices) was acquired across the same volume as a reference. The raw data were submitted to the data preprocessing pipeline for cleaning and feature extraction.
Pre-processing of calcium imaging data
Drift correction and ROI (neuron) extraction
Raw images were organized as hyper stacks in the order of x-y-z-t. Each hyper stack was split into 26T-stacks at different z positions and drift corrected with the StackReg plugin [65]. For each drift-corrected T-stack, neuronal nuclei were segmented using the Laplacian of gaussian blob detection algorithm blob_log in the Python library scikit image [66]. Since neuronal nuclei (~4μm in diameter) can be imaged in two or more adjacent planes, a redundancy detection algorithm was developed to find lateral duplications in cell extraction: if a nucleus is detected at the same (x; y) position in the kth and the (k + 1)th planes, this detection would be considered as redundant and the Z-position was considered as an average between zk and zk+1. In each extracted neuronal nuclei, its raw fluorescence signals were calculated through the entire T-stack, and the relative signal intensity of calcium transients, ΔF/F, was calculated using the method as previously described [47]. Two additional cleaning steps were applied to remove potential artifacts that were falsely recognized as neuronal nuclei by the blob detection algorithm: first, since GCaMP6s has background signals in the absence of action potentials, blobs of real neurons should have a high fluorescence baseline, and blobs with very low baseline (comparable to dark areas in the image) are excluded; second, since in reality, the value of ΔF/F should fall within a reasonable range, blobs with extraordinarily high ΔF/F values (exceeding a threshold during recording) were excluded.
Since activity levels vary among neurons, over the 15-min imaging session, some neurons may exhibit high calcium signal peaks while others remain “silent”. Since the latter are not likely to contribute to downstream analyses, it would be beneficial to exclude them from the very beginning. This requires us to (1) find a reliable method to identify peaks from background in a noisy timeseries; (2) find a measure for the activity level of each neuron, i.e., whether and how much does it activate during the imaging session; and (3) set a well-defined criterion to accept or reject a neuron based on its two characteristics above. The method we used to identify peaks and baselines from each ΔF/F time trace was based on the Bayesian inference of two-dimensional distribution of adjacent ΔF/F values [33]. For each neuron i, its baseline of ΔF/F, μi, was calculated by averaging all the (ΔF/F)I time points that are identified as background,
$$\left|{a}_i\right.\sum \limits_k\left({S}_{i,k}-{\mu}_i\right)\times \Delta t,$$
where Si,k refers to the kth time point of (ΔF/F)i.
Image registration
Since individuals differ in morphology, placement, and brightness, it is difficult to directly compare their images even though the latter are acquired under the same condition. In order to compare imaging results from different individuals, the images should be anatomically mapped to a common template image, i.e., a “reference”. A whole-brain template, Z-brain, has been provided by Randlett et al. [46] as a standard reference brain atlas for anatomical and functional studies of larval zebrafish brain. Although the fish we experimented on were at the same stage as that in the Z-brain template, the iSPIM stacks acquired in their dorsal forebrain could not be directly registered to the reference due to (1) a significant mismatch between two imaging directions and (2) a significant difference between the volumes of interests. Since the dorsal forebrain region that we imaged only accounts for a sub-volume of the entire brain, a direct registration of the former to the latter is prone to error. To solve this problem, we created an intermediate reference brain from the Z-brain template by resampling the dorsal-forebrain region in the direction and pixel sizes that are comparable to those of the iSPIM stacks. For each fish, its densely sampled Z-stack was registered to the intermediate reference brain using computational morphometry toolkit (CMTK) (http://nitrc.org/projects/cmtk). After registering the iSPIM stacks into the intermediate template, the coordinates of the extracted neurons in original T-stacks need to be reformatted accordingly into the registered frame. This step was carried out using the registration output and the function stream x form in CMTK. By comparing the reformatted coordinates with the 294 anatomical masks in the Z-brain template, we were able to identify the anatomical label of each neuron and the brain regions covered by our imaging volume.
Preprocessing of the resting state human brain fMRI data
We used healthy control subjects (n=74) from the COBRE16 resting-state fMRI datasets available at http://fcon_1000.projects.nitrc.org/indi/retro/cobre.html. The fMRI dataset for each subject includes blood-oxygenation level-dependent (BOLD) volumes of 5 min (TR = 2 s, TE = 29 ms, FA = 75°, 32 slices, voxel size = 3x3x4 mm3, matrix size = 64x64, FOV = 255 x255 mm2). The pre-processing steps included realignment, co-registering, and normalization. We used established preprocessing and analysis pipelines [67] and CONN software package (https://www.nitrc.org/projects/conn) to estimate and remove noise components such as those from cerebrospinal fluid, white matter signals, and subject motion. The temporal band-pass filter (between 0.008 and 0.09 Hz) was applied to remove frequencies that were not of interest in the raw data. In addition to the default denoising strategy, data-driven Independent Component Analyses (ICA) denoising [68] was utilized to detect and remove potential noise-related temporal components.
Re-alignment
In brief, the first-level covariate containing the 6 rigid-body parameters was created based on the MRI data to estimate the subject motion. For each subject, this variable was used to perform regression on the fMRI data to correct for motion-related effects.
To reduce the physiological noise source, a Component-Based Noise Correction Method (CompCor) was used [69].
Co-registering
The functional volumes are co-registered with the ROIs and structural volumes. All the Brodmann areas (ROIs) defined through the Talairach Daemon (TD) system [56] were assigned to all subjects using segmentation of structural image; gray matter, white matter, and cerebrospinal fluid (CSF) masks were generated. Anatomical volumes were co-registered to the functional and ROI volumes for each subject, and the volumes were transformed to the MNI-space.
Calculation and normalization of fMRI measures
Following re-alignment and co-registering, the Principal Component Analysis (PCA) algorithm was used to extract BOLD signal components for each ROI. The fMRI measures were calculated using MATLAB-based software packages, SPM12 (http://www.fil.ion.ucl.ac.uk/spm/). All of the computed measures are normalized to an N (0,1) Gaussian distribution for each subject.
Data analysis
Analysis and classification of ROI activity levels
For a given ROI (i.e., individual neurons in larval zebrafish or individual brain regions in the human fMRI data), signal si represents calcium indicator fluorescence changes (ΔF/F) in the zebrafish data or BOLD signals in the human fMRI data over time, the activity value aci was calculated as follows:
$${ac}_i= mean\ \left({\left({s}_i- mean\left({s}_i\right)\right)}^2\right)$$
After obtaining each ROI activity level in the given time series, we used the K-means algorithm [49] to classify them into activity levels 1–3 (or 1-5). The k-means clustering algorithm minimizes the within-cluster squared Euclidean distances. Here, a one-dimensional activity population was partitioned into 3 sets (levels). The within-class cells in each level have similar activity.
Analysis and classification of ROI functional connectivity levels
We used Pearson correlation to measure functional connectivity between ROIs. Since Pearson correlation assigns a value to all ROI pairs, it is necessary to apply thresholding to eliminate potentially spurious connections. There is no standard method to calculate the optimal threshold value τoptimal, and different values of τ are used to create the adjacency matrices. Arbitrarily chosen thresholding values are often applied to raw matrixes [6]. As different cutoff values can directly influence network properties and bias analysis results, we developed algorithms to calculate the optimal thresholding values and generate the connectivity matrix. This matrix was then used to calculate the functional connections (i.e., degrees) of each ROI, followed by K-means classification into three connectivity levels. The steps of calculating functional connections for each ROI are as follows:
-
1)
Let ρ = ρij be the correlation matrix, where ρij is the Pearson correlation of ROIs i and j and can be calculated as follows:
$${\rho}_{ij}=\frac{\mathit{\operatorname{cov}}\left(i,j\right)}{\delta_i{\delta}_j},$$
-
2)
Setting the threshold values:
$${\displaystyle \begin{array}{c}\tau =\left({\tau}_k\right)\\ {}{\tau}_k\in \left(0,1\right)\end{array}}$$
-
3)
Thresholding the connectivity matrix using τk ∈ τ or each ρij
$${\rho}_{\begin{array}{c}{\tau}_k\\ {} ij\end{array}}=\left\{\begin{array}{c}{\rho}_{ij}\ if\ {\rho}_{ij}>{\tau}_k\\ {}0\ otherwise,\end{array}\right.$$
where \({\rho}_{\tau_k}=\left({\rho}_{\begin{array}{c}{\tau}_k\\ {} ij\end{array}}\right)\) is the thresholded correlation matrix.
-
4)
Binarizing \({\rho}_{\tau_k}\)
$${\rho}_{\begin{array}{c}{\tau}_{k,b}\\ {} ij\end{array}}=\left\{\begin{array}{c}1\kern0.5em if\ {\rho}_{ij}>{\tau}_k\\ {}0\ otherwise,\end{array}\right.$$
-
5)
Calculating the degrees for each ROIi, degreei = ∑jli, j , where li, j is the link between roii and roij.
-
6)
Calculating the degree distribution of the network. The fraction of ROIs with the degree k is defined as follows:
$${P}^{\tau_k}(k)=\frac{n_k}{n},$$
where nk is the number of the ROIs that have degree k.
-
7)
Calculating the fitting value of the degree distribution with the power-law distribution. r2 (coefficient of determination) was used to evaluate the closeness of data at each threshold value to the power-law curve.
-
8)
Derive the optimal threshold value: At the optimal threshold value, the r2 is highest, which indicates the best fit of the data to the power-law curve.
$${\tau}_{optimal}^{pw}\approx \mathit{\arg}\ {\mathit{\max}}_r\ \left({r}^2\right)$$
The detailed steps of calculating the optimal threshold value of the connectivity matrix were provided in the Additional file 4. The average \({\tau}_{optimal}^{pw}\)value of all zebrafish subjects in our data ranged from 0.4 to 0.6. We applied this algorithm to the human fMRI data and obtained 0.7 for \({\tau}_{optimal}^{pw}\). Using these values, we binarized our data: correlations with a value less than \({\tau}_{optimal}^{pw}\) were set to zero, whereas those with a value greater than \({\tau}_{optimal}^{pw}\) were set to one.
To further test whether the observed power-law structure of the functional brain is relevant, we shuffled the data (Additional file 5) to generate random networks with the same numbers of nodes and edges as the original networks and applied similar thresholding and analysis of degree distributions. The random network did not follow a power law structure at any thresholding values tested. Together, these analyses enable us to establish optimal thresholding values that uncover biologically relevant networks. Using such matrixes, we were able to detect known connections between the olfactory epithelia and the olfactory bulb (Additional file 6).
After obtaining each ROI’s numbers of functional connections in the given time series, we used the K-means algorithm as described above to classify them into connectivity levels 1–3 (or 1-5).
Analysis of the relationship between activity and functional connectivity
We plotted the activity and connectivity for each ROI in each individual. The input data are zebrafish calcium imaging data are composed of >12k individual neuronal activity and connectivity per subject (n=9 subjects); human resting-state fMRI data are composed of 136 brain regions’ activity and connectivity per subject (n=74 subjects).
To analyze the population frequency of each class of ROIs (i.e., AL-1&CL-1, AL-1&CL-2, AL-1&CL-3, AL-2&CL-1, AL-2&CL-2, AL-2&CL-3, AL-3&CL-1, AL-3&CL-2, AL-3&CL-3) and its statistical significance, we used the following bootstrapping method:
-
1.
Select the number of the bootstrapping iteration (here, 25 iterations were used).
-
2.
Repeat the steps “3” to “7” 25 times for the input data.
-
3.
Select a sample set with replacement from the set of all subjects.
-
4.
Calculate activity (variances) and connectivity (degrees) data for each ROI from all sampled subjects.
-
5.
Calculate activity (variances) and connectivity (degrees) data for each ROI from all sampled subjects.
-
6.
Calculate the average ROI population size for each activity and connectivity level.
-
7.
Calculate the mean, lower (2.5 percentile), and upper (97.5 percentile) point-wise confidence bands for the populations that are calculated in the step “6”.
Generation of shuffled data, noise-added data, and simulated data
Generation of shuffled data
For the zebrafish data, a given ∆F/F signal matrix DC × T with C rows (C ROIs) and T columns (T time stamps) was used to generate a new matrix \(\hat{D_{C\times T}}\) by random shuffling across rows and columns with the NumPy library of Python programming language [70]. For the human data, a given BOLD signal matrix was used and shuffled the same as for the zebrafish data.
Generation of noise-added data
To generate noise-added data, we first generated a noise matrix \({\mathrm{N}}_{\mathrm{C}\times \mathrm{T}}^{\mathrm{s}}\)with normal (Gaussian) distribution and standard deviation (noise scale) s to produce different levels of noise, using the NumPy library. This noise matrix was then added to a given ∆F/F matrix DC × T (C cells and T time stamps) (in the case of zebrafish data) or a given BOLD signal matrix (in the case of human data). The noise-added data \(\overline{D_{C\times T}}\) were as follows:
$$\overline{D_{C\times T}}={D}_{C\times T}+{N}_{C\times T}^s$$
Interconnected network simulation and approximation of the correspond ROI activity using the Cholesky decomposition technique
This method consists of two steps: First, we generated a random covariance (connectivity) matrix using the Python scikit library [71]. In order to apply the Cholesky method to construct a ROI activity matrix from the connectivity matrix, the starting connectivity matrix must be symmetric positive-definite in linear algebra [53]. Second, we created a ROI activity matrix that matched the above connectivity matrix, using the Cholesky method (53). The connectivity matrix CCell × Cell was decomposed into two parts: L and LT. We then generated a random matrix (cell x time). For any given random matrix XCell × Time, the simulated ROI activity time series matrix SCell × Time was as follows:
$${S}_{Cell\times Time}={L}_{Cell\times Cell}{X}_{Cell\times Time},$$
Data visualization
Different python libraries were used to visualize the results. The plotly libraries (https://plotly.com/python/) were used to visualize the cells’ anatomical and spatial distributions in the calcium imaging data. We used the FSL (https://fsl.fmrib.ox.ac.uk/fsl) to visualize the brain regions in the fMRI data.
Statistical analysis
Sample sizes and statistics are reported in the figure legends and text for each measurement. To determine the relationship between activity and functional connectivity at individual ROI levels (Figs. 3c and 4e), we used bootstrap tests (with 7 iterations) to test whether the negative correlation between neuronal activity and functional connectivity is consistent across subjects. For each subject, the activity (variances) and connectivity (degrees) for each ROI were calculated. Individual neurons in the larval zebrafish calcium imaging data and individual brain regions in the human fMRI data were considered as ROIs. The ROIs were sorted based on their activity for each subject (X-axis). The connectivity of sorted ROIs for each subject (Y-axis) was then graphed. The locally weighted regression algorithm [72] was applied for each subject’s data to approximate the polynomial curve. Finally, the bootstrapping method [52] was used to evaluate the inverse relationship between neuronal activity and functional connectivity. Here, random sampling with replacement was applied for selecting the curves fitting of individual subjects to evaluate the model.