Adult zebrafish (Danio rerio) were reared and maintained in a zebrafish housing system (Tecniplast, Buguggiate, Italy) under standard conditions using the rotifer polyculture method for early feeding 5 to 9 dpf. Embryos were reared in embryo medium (1.37 mM NaCl, 53.65 μM KCl, 2.54 μM Na2HPO4, 4.41 μM KH2PO4, 0.13 mM CaCl2, 0.16 mM MgSO4, and 0.43 mM NaHCO3 at pH ~ 7.2) at 28.5 °C on a 14-h light to 10-h dark cycle. To generate the experimental cohort, zebrafish carrying the fmr1hu2787 allele (kindly provided by Howard Sirotkin)  were bred to zebrafish carrying the elavl3:H2B-GCaMP6 transgene . The fast variant of GCaMP6, elavl3:H2B-GCaMP6f, was used to capture baseline and auditory sensitivity activity. The slow variant elavl3:H2B-GCaMP6s was used for multisensory-evoked experiments. Zebrafish mutant transgenic animals were out-crossed for four generations to the Tupfel long fin nacre WT strain (Zebrafish International Resource Center, Eugene, Oregon).
We generated the experimental cohort by inter-crossing the fourth generation of zebrafish heterozygous for both fmr1hu2787 and elavl3:H2B-GCaMP6 to produce clutches with a Mendelian ratio of 1:2:1 (WT: fmr1+/−: fmr1−/−). For microscopy experiments, larvae were pre-screened for GCaMP6 expression under a fluorescence microscope at 3 dpf. Genomic DNA was extracted from 6 dpf larvae at the end of imaging sessions and then preserved for long-term storage. Following quality control and the segmentation or larvae tracking stages of data analyses, genotyping for fmr1 was performed as previously described . We remained blinded to genotype until the later stages of data analysis to minimize systemic biases introduced during experimentation and quality control.
Larvae at 6 dpf were embedded upright in 1.5% low melting temperature agarose inside of a custom-built imaging chamber . Imaging chambers were composed of a 3D-printed base (24 × 24 mm) with four posts (3 × 3 × 20 mm) raised along the four corners of the platform. The four outward faces of the chambers were fixed with a glass coverslip (20 × 20 mm, 0.13–0.16 mm thick). Individual larvae were mounted onto a raised platform (11 × 11 × 6 mm) within each chamber. The platform was no closer than 3 mm to any of the glass surfaces. Chambers were filled with embryo medium once the surrounding agarose had set.
Visual stimuli were displayed on a 75 × 55 mm LCD generic PnP monitor (1024 × 768 pixels, 85 Hz, 32-bit true color) positioned 35 mm lateral to the larva [59, 60] (Fig. 1a). The monitor was covered by a colored-glass filter (Newport, Irvine, CA) with a cut-on wavelength of 550 nm. Auditory stimuli were played from a 9-mm 1-watt haptic feedback and audio exciter (Dayton, Springboro, OH) fixed to the glass surface of the imaging chamber posterior to the larva. The audio exciter was wired to a 2 × 15-watt miniature amplifier (Dayton, Springboro, OH).
In vivo GCaMP6 imaging was performed using a custom-built light-sheet microscope. As the experimentalists were blinded to genotype, all imaging parameters were the same across genotypes. The microscope has previously been described , including the line diffuser used to reduce striping artifacts . For multisensory experiments, to avoid the laser paths streaming directly into the eyes (which would interfere with the perception of visual stimuli), we blocked the side laser plane and restricted the front laser plane to the area between the eyes using a custom-made 5-mm wide adjustable slit. For baseline and auditory sensitivity experiments, we used both the side and front laser planes. The exposure time for all experiments was 10 ms. The captured images were binned 4x, yielding a final image resolution was 640 × 540 pixels at 16-bit in a tagged image file format. For baseline and auditory sensitivity experiments, 25 transverse sections at 10-μm increments were recorded at four brain volumes per second for 10 min. For multisensory experiments, 50 transverse sections at 5-μm increments were sampled at two brain volumes per second for 4 min and 7 s.
Stimulus trains for calcium imaging
For multisensory experiments, we presented three sensory stimuli to each larva three times in a semi-randomized order. The stimuli had a minimum inter-stimulus interval of 3 s. Following 15 s of acclimatization to laser scanning, the brain was imaged for 40 s at rest and then for 3 min and 12 s over the course of the multisensory stimulus train. We presented visual stimuli at a resolution of 1024 × 768 pixels at 20 frames per second. The two visual stimuli were moving vertical bars (visual flow) and an expanding disk (visual loom). The visual flow stimulus comprised eight bars, 128-pixels in width, moving in a caudal to rostral direction at a speed of 21.3° per second for 4 s. Prior to each visual flow, a 12-s linear dimming from white to black-and-white bars occurred, and similarly, a black-and-white to white 12-s linear brightening occurred after each visual flow stimulus (total stimulus duration 28 s). The visual loom consisted of a black 4-pixel diameter circular disk that exponentially expanded (Simple, Fastest; easing of − 80 designed on Adobe Animate v18.0) to an 812-pixel diameter circular disk over 6 s at a minimum angle of ~ 11° and a maximum angle of ~ 90°. A linear brightening from black to white over a 12-s duration followed each visual loom (total stimulus duration 18 s). The auditory stimulus was created using the professional audio software, Live 10 (Ableton, Berlin, Germany), and was normalized to 0 dB relative to full scale. It comprised 1 s of white noise with a 2-ms rise/fall time. The sound level of white noise at full scale was 84 dB (background noise 40–45 dB) and was measured using a 30–120-dB range digital sound level meter (DOSS, Melbourne, Australia) positioned just above the imaging platform, where the larvae would be. The total length of the multisensory-evoked stimulus train was 4 min.
For auditory sensitivity experiments, we presented two 30-s white noise amplitude ramps (to 0 dB) and twelve discrete amplitudes of white noise with a duration of 1 s at 3-dB intervals. We played each discrete amplitude once in three blocks, with an inter-stimulus interval of 14 s within blocks and an inter-block interval of 30 s. Following 15 s of acclimatization to laser scanning, the brain was imaged for 90 s during which the first amplitude ramp was presented. The first block comprised the twelve volumes with increasing amplitudes. The second block was quasi-randomized (− 21, − 27, − 12, − 33, − 9, − 18, − 6, − 24, 0, − 15, − 30, − 3 dB), and the third block had decreasing amplitudes. The second amplitude ramp was presented at the end of the stimulus train. The total length of the auditory stimulus train was 11 min and 45 s.
Analysis of calcium activity
We analyzed larvae that met the following four criteria: (1) showed robust responses to the first visual loom and auditory stimuli (6 of 77 excluded); (2) survived to the end of the imaging session (1 of 77 excluded); and (3) contained WT or fmr1−/− larvae (that is, not fmr1+/− exclusively) during the imaging session (imaging sessions were between 4 to 12 min; 7 of 77 excluded). We used the same animals for the baseline and auditory sensitivity experiments (n = 23) and a separate animal cohort for the multisensory experiments (n = 29). We measured baseline activity first, followed by auditory sensitivity.
For larvae passing the inclusion criteria, we separated their four-dimensional imaging stacks (time, x, y, z) into individual time series for every x-y plane using ImageJ v1.52c (Rueden et al., 2017). Each plane was then motion corrected using the Non-Rigid Motion Correction (NoRMCorre) algorithm . ROIs, and their corresponding calcium traces, were extracted, de-mixed, and denoised using the CaImAn package  as previously described [36, 38]. Following segmentation, we ensured that the number of ROIs detected within any of the brain regions of interest was within an order of magnitude of the median (11 of 77 excluded). Once genotypes were provided, we found similar numbers of ROIs corresponding roughly to single neurons were detected per larvae (WT = 16,593 ± 1165 across 5 animals; fmr1+/− = 17,077 ± 1316 across 10 animals; fmr1−/− = 17,321 ± 670 across 7 animals (mean ± s.e.m.)) (Additional file 9). For multisensory experiments, similar numbers of ROIs were detected per larvae (WT = 45,015 ± 4177 across 7 animals; fmr1+/− = 49,768 ± 2141 across 17 animals; fmr1−/− = 40,123 ± 3137 across 5 (mean ± s.e.m.)) (Additional file 10).
Calcium traces of all animals and all planes were subsequently pooled per genotype and z-transformed for further analysis using MATLAB v9.5 (MathWorks, Natick, MA). For baseline activity, we computed the correlation between all pairs of ROIs and used their spatial localization to calculate Euclidean distances. We estimated calcium event rates by detecting the peaks in each trace that were separated by at least 2 s and that increased locally by at least 1 SD above the baseline. For evoked activity, we modeled calcium traces using a multivariate linear regression against regressors (using a typical GCaMP6 response) that corresponded to the timing of the relevant stimulus types . For multisensory experiments, we built three regressors for each of the three presentations of visual flow, visual loom, and auditory stimuli. Likewise, for the auditory sensitivity experiments, we built twelve regressors for each of the discrete sound amplitudes.
Thresholding and calculating responsiveness
For the multisensory dataset, we defined an ROI as responsive to a particular stimulus if it had a regression coefficient 2 SD above the mean of all regression coefficients and had an r2 value greater than 0.1 (26th percentile) in the WT group. WT thresholds were applied to fmr1+/− and fmr1−/− groups. Specifically, the regression coefficient thresholds 2 SD above the mean for visual flow, visual loom, and auditory stimuli were 2.0462, 2.0920, and 2.2894, respectively.
For the auditory sensitivity dataset, we defined an ROI as responsive to a particular stimulus if it had a regression coefficient greater than 0 and had an r2 value greater than 0.05 (80th percentile). Lower thresholds were used for the auditory sensitivity dataset compared to the multisensory dataset because we sought to detect all responses to all auditory stimuli in the sensitivity dataset (including low amplitudes) and to accommodate for a longer stimulus train. Given the complexity of the auditory stimulus train, and the low stringency of the thresholds, non-auditory signals were pooled with the signal. To improve the signal-to-noise ratio without increasing the stringency of the thresholds for the auditory sensitivity dataset (considering that we wanted all auditory categories including weak responses), we excluded time points in individual animals that startled at a particular amplitude, as these generally represent motor responses rather than auditory responses. In the WT group, we removed 50 frames flanking the first − 6 dB amplitude in larva 1, the first − 15 dB amplitude in larva 2, the second 0 dB amplitude in larva 3, and the third − 27 dB amplitude in larva 5. In the fmr1−/− group, we removed 50 frames flanking the first − 6-dB amplitude in larva 4 and the second − 33-dB amplitude in larva 7 (removing a total of 6 trials among 468 in the experiment).
We subsequently applied k-means clustering on the time series to produce five components per genotype for each of the ten brain regions of interest. Five was empirically chosen because it produced clusters that were well represented across the fish population and genotypes. All non-auditory clusters were excluded, and then, the remaining auditory clusters of all animals and all planes were pooled together for each region per genotype, unless a striking difference was observed in the average response traces or spatial localizations. Such striking differences were observed three times, in the tegmentum of both genotypes and in the torus semicircularis of fmr1−/− larvae. As k-means forces all ROIs to belong to a cluster, we removed ROIs with a low correlation to the mean of each cluster (correlation > 0.2) to remove additional noise. The resulting ROIs were classified as auditory responsive in the auditory sensitivity dataset.
To calculate the proportion of ROIs responsive to a particular stimulus, ROI numbers above the regression coefficient and r2 thresholds were normalized to the total number of ROIs detected in the whole brain or relevant brain region. To detect changes in the distribution of regression coefficients, the mean regression coefficients were calculated for ROIs with an r2 above threshold (0.1 or 0.05 depending on the dataset). Similarly, to detect changes in the distribution of r2 values, the mean r2 was calculated for ROIs per larvae with a regression coefficient above threshold (+ 2 SD of the WT or 0 depending on the dataset).
Registration and visualization of calcium activity
Once the motion corrected individual x-y planes were complete, we used the three dimensional stacks of all animals included in a particular dataset to build a common template with Advanced Normalization Tools . We then registered the template to the elavl3-H2BRFP zebrafish line on the Z-brain atlas . The resulting warps were applied to the centroids of all ROIs for each larva, which were then placed into the 294 brain regions defined by the Z-brain atlas as previously described . We visualized the spatial information and classified activity of each ROI using the Unity Editor (Unity Technologies, San Francisco, CA). Specifically, for the multisensory dataset, we represented ROIs as spheres with their diameter representing their r2 value (1 + r2 × 5 in μm), and color based on their regression coefficient value. For the auditory sensitivity dataset, spot size and color were uniform. The template brain, upon which we overlaid this information, was generated by creating an isosurface mesh over the combined masks of the diencephalon, mesencephalon, rhombencephalon, eyes, and telencephalon from the Z-brain Atlas using ImageVis3D (Scientific Computing and Imaging Institute, Salt Lake City, UT).
Motion detection during calcium imaging
To identify motion cues during brain-wide calcium imaging, Y-axis displacement (pixels) was extracted over time (seconds) from maximum pixel projected hyperstacks of the raw calcium imaging time series. The cv_align stacks plugin from ImageJ was used. The Y-displacement values were subsequently filtered using a linear moving average filter (window size of 5) to remove non-linearity caused by any larvae not returning to their original positions following movement. Motion was estimated by calculating the area below the moving-average filtered displacement data using trapezoidal numerical integration.
We defined the number of nodes in our graph by performing k-means clustering on the 3-dimensonal coordinates of all ROIs within each of the ten brain regions per genotype into k number of clusters. k was chosen as the largest number (starting at 20 + number of ROIs/1000) at which at least 10 ROIs from at least 3 different larvae were retained. This was repeated for each brain region and produced a total of 132 nodes in the WT and 134 in the fmr1−/− cohort. To generate nodes that were matched across genotypes, k was defined as aforementioned, but required at least ten ROIs in at least three different fish from each genotype. We repeated this for each of our ten brain regions to produce 49 nodes in the WT and fmr1−/− cohort. This approach ensured all ROIs that contributed to each node (for both the unmatched and matched-node graphs) exclusively belonged to one of the ten broadly defined brain regions. We generated the mean z-scored fluorescent response from all the ROIs belonging to these nodes across all animals. To determine whether the network effects observed were greater than those expected by chance, we used an Amplitude Adjusted Fourier Transform  to generate a temporally shuffled time series which conserves the statistical properties of the original. To avoid shuffling the responses out of the time windows we used to study the different stimuli, we performed shuffling within each time window corresponding to each stimulus for all datasets. The mean responses, of both the shuffled and unshuffled datasets, were used to build correlation matrices for each genotype and binarized with a threshold of 0.85 correlation, from which we built undirected graphs. The Brain Connectivity Toolbox  was used to calculate network measurements of the graph from each genotype, such as the density or the participation coefficient between brain regions, as previously described . It was also used to measure topological Rentian scaling using the rentian_scaling_3d function with 5000 boxes on the same binarized correlation matrices as the graphs, with the 3d z-scores coordinates of each nodes as inputs. The robust linear regression was computed in Matlab with the defaults settings. Circle plots were produced using the circularGraph toolbox.
Neural Decoder analysis
We performed decoder analysis using the Neural Decoding toolbox (https://github.com/KordingLab/Neural_Decoding)  on the auditory sensitivity dataset. The discrete amplitudes of sound were the experimental features used to decode patterns of activity using the Xtreme Gradient Boosting (XGB) model. For each brain region per genotype, auditory-responsive neurons were randomly divided into 10 subsets. For each of the 10 subsets, the first trial (with the ascending auditory stimulus block) was used to train the XGB model. The XGB model was tested on the second trial (with the quasi-random auditory stimulus block) and then validated on the third presentation (with the descending auditory stimulus block). The r2 was calculated between the predicted features and the actual sound amplitudes.
Significance between two genotypes was tested using unpaired two-tailed t tests with the Holm-Sidak (or Sidak for Rentian scaling) method for multiple comparisons. Genotypes showed different variations; thus, P values were computed individually (did not assume consistent SD). Significance between three genotypes were tested using an unpaired Kruskal-Wallis test with the Dunn’s multiple comparisons method. Given that data were based on counts (which is positively skewed because they are truncated at zero), the Kruskal-Wallis test was selected because it does not assume that data is sampled from Gaussian distributions. All statistical analyses were performed in Prism v8.0.1 (GraphPad, San Diego, CA).