Animals
Our subjects were nine female Wistar rats which were 8-week-old and weighed 216–242 g at the beginning of the study. Of these, four were first used for behavioral training and psychoacoustic determination of ITD TWFs (see “Behavioral study” section). All nine (4 trained and 5 naive animals) were then used in terminal electrophysiological experiments to elucidate the cortical encoding of the binaural stimuli (see “Electrophysiology” section). The rats were housed in standard cages with 2 or 3 rats in each.
Preyer’s reflexes were tested, and the outer ears and tympanic membranes were visually examined to ensure that the rats had healthy, sensitive hearing. In addition, prior to the behavioral and electrophysiological experiments, acoustic brainstem responses (ABRs) were recorded to confirm normal hearing sensitivity (data not shown). For this examination, the rats were anesthetized by intraperitoneal injection of ketamine (80 mg/kg, 10%, Alfasan International B.V., Holland) and xylazine (12 mg/kg, 2%, Alfasan International B.V., Holland). Eye gel (Lubrithal, Dechra Veterinary Product A/S Mekuvej 9 DK-7171 Uldum) was applied to prevent the eyes from drying. The outer ear canals and tympanic membranes were inspected under a microscope (RWD Life Sciences, China). The rats were then fitted to a stereotactic instrument with a pair of hollow ear bars in a sound attenuating chamber, and ABRs to pulses were recorded to ascertain low hearing thresholds in both ears. ABR thresholds less than 30 dB SPL were considered indicative of normal auditory sensitivity. All the rats we used here had ABR thresholds < 30 dB. More detailed description of ABR recordings can be found in [15].
Behavioral study
Behavioral training setup
The behavioral setup and the training methods were identical to those described in sections “Channel-wise regression shows only sporadic precedence effects in electrocorticographic (ECoG) signals from the auditory cortex” and “Multivariate decoding shows strong precedence effect in ECoG signals from the auditory cortex” of [14]. In brief, a training box was situated in a sound attenuating box and the front wall of the training box was fitted with three brass water spouts. Two hollow tubes were connected to a pair of mini headphone drivers (GQ-30783-000, Knowles, Itasca, Illinois, USA) to deliver the sound stimuli delivered by a USB sound card (StarTech.com, Ontario Canada, part No. ICUSBAUDIOMH) and amplified by an audio amplifier (Adafruit stereo 3.7 W class D audio amplifier, part No. 987) into the behavioral training box as close to the rats’ ears as possible. Stimulus delivery and monitoring and control of the behavioral task were performed by a Raspberry Pi computer running custom written Python software.
Behavioral training task
During behavioral training and testing, the four animals were tested five days a week, with two rest days. Our training used drinking water as a positive reinforcer. Therefore, a day prior to the first testing day, the home cage water bottles were removed, and for the following 5 days, the rats only had access to drinking water during their twice daily testing sessions. They then had easy access to ad lib water from the evening of the fifth training day until the morning of the second rest day. Food was available ad lib in the animals’ home cages throughout.
Behavioral training and testing were essentially identical to [14], except that a different set of stimuli was designed and used to enable the quantification of TWFs. In the behavioral experiments, rats performed a 2-AFC near-field lateralization task. Rats initialized each trial by licking a centrally positioned “start spout.” Initiating a trial was rewarded with a small drop of water on a random subset of 1 in every 7 trials. Initiating a trial triggered the delivery of a binaural stimulus, to which the animals responded by licking one of two “response spouts” positioned either side of the start spout. If the animal’s choice corresponded to the side indicated by the binaural cues, it was rewarded by three small drops of water delivered through the response spout. If the response was incorrect, it triggered a 15-s “timeout” during which a 90-dB negative feedback sound was played and no new trials could be initiated. If the rat made a wrong response, the following trial would be a “correction trial,” in which the last stimulus was repeated. “Correction trials” help reduce the tendency of animals to develop responses biases toward one side, but are excluded from the calculation of the correct response scores. Each rat performed two sessions per testing day, one in the morning and one in the afternoon, each session lasting ~ 20 min. The animals would typically perform between 100 and 200 trials per session.
The rats were initially trained with 200-ms-long, 300 Hz binaural pulse train stimuli which contained both ILD (± 6 dB) and ITD cues (± 0.136 ms). The motivation for the initial combined ITD and ILD training was to start the animals off with stimuli that should be very easy to lateralize as they contain “natural” combinations of both ITD and ILD binaural cues with rats to quickly adapt to the training environment. We required that the rats lateralized these initial training stimuli at least 80% correct in at least 2 sessions to advance to the next “ITD-only” training stage, during which ILDs were set to 0 dB. The rats reached the 80% correct criterion in this first stage of training after 8–10 days of training. After the initial training phase, all stimuli presented throughout the rest of the study had 0 dB ILDs and varied in ITD only. Once the rats reached 80% correct on two sessions with the 0.136-ms ITD-only stimuli, we increased the range of ITD values tested in each session. During this stage, the ITDs presented at each trial were drawn at random from the set ±[0.1587, 0.136, 0.0907, 0.068, 0.0454, 0.0227] ms. This set purposefully includes some ITD values that are below previously determined perceptual ITD thresholds for rats (~ 0.05 ms, see [14]), in order to accustom the animals to the possibility that sessions may include trials that may be difficult to lateralize. In this potentially more difficult “wide ITD range” training stage, the rats had to reach 75% correct in at least two sessions before advancing to the TWF testing stage. Animals which did not quickly advance to that final stage were given additional training sessions with easier stimuli, during which timeouts and reward quantities were individually adjusted as necessary to achieve reliably high levels of performance.
Acoustic stimuli for behavioral TWF measurement
The stimuli used in the TWF measurement phases of our experiments were modeled on the stimuli developed by [6]. Our stimuli consisted of trains of 8 binaural pulses (Fig. 1) delivered at rates of either 20, 50, 300, or 900 Hz. Importantly, the ITD of each pulse in the train was varied by introducing a small, random “temporal jitter” in the timing of each pulse which was independent in each ear. An analysis of lateralization judgments for a large number of stimuli with different ITD values at each pulse in the series would then make it possible to determine the “weight” (that is, the contribution made) by the nth pulse in the train to the perceived lateral position of the pulse train as a whole.
A potential difficulty with the use of these stimuli is that one cannot always determine a priori whether a subject’s lateralization response is “objectively correct.” Because ITD cue values of different pulses in the train could, by design, point in opposite directions, and how such ambiguous stimuli with conflicting ITD values are “supposed to be perceived” depends on the subject’s own TWF which is unknown at the start of the experiment. Brown and Stecker [7] simply trusted their human participants to understand the objective of the experiment and to report their lateralization judgments faithfully, presumably with low error or bias, and without the need of trial-by-trial reinforcement. Our rats, in contrast, need to be kept motivated and honest throughout the experiment by regular rewards for “correct” responses. Therefore, we constituted each block of trials as a randomly interleaved set of “honesty trials” and “probe trials.” In honesty trials, the ITDs of all pulses in the 8-pulse sequence pointed in the same direction, that is, they were either all positive (right ear leading) or all negative (left ear leading), so that the response to these honesty trials could be judged objectively as correct or incorrect irrespective of the details of each animal’s TWF. Pulses in honesty trails had a fixed ITD offset of ± 0.083 ms, plus an additional jitter drawn uniformly at random from a range of ± 0.042 ms, in steps of 10.4 μs afforded by the 96 kHz sample rate Hi-Fi USB Audio sound card. Since most ITD values in an honesty trial should be above typical rat ITD thresholds reported in [14], we expected them to be relatively easy to lateralize correctly, and responses to honesty trials were only rewarded if the animal responded on the appropriate side. We required that the rats lateralized at least 80% of honesty trials correctly in at least two sessions before they would also receive “probe trials,” in which the ITD for each pulse was drawn independently and uniformly from the range of ± 0.125 ms and ITDs of subsequent pulses in the train were allowed to point in opposite directions. Responses to probe trials were always rewarded regardless of the side on which the animal responded. In each TWF testing session, honesty trials and probe trials were randomly interleaved at a ratio of 2:1. The large proportion of honesty trials ensured that random guessing without attending to the sounds was not an effective strategy for the animals and allowed us to monitor that the animals continued to report their lateralization percepts with good accuracy throughout. During informal testing, the authors were unable to distinguish honesty trials from probe trials just by listening to them, and there is no indication that the rats could distinguish these either. We therefore consider it safe to assume that the rats’ responses to the probe trials accurately reflected their lateralization judgments for these stimuli. After reaching the “ITD-only” lateralization training criteria described above, all four rats were also able to meet the 80% correct TWF honesty trial criterion after minimal training, as might be expected given that to casual human observers, TWF stimuli with jittered ITDs and stimuli with fixed ITD sound indistinguishable.
Behavioral data analysis
TWFs were computed from the responses to the probe trials only, and separately for each of the four pulse rates (20, 50, 300, and 900 Hz), by computing a Probit regression to fit the probability of a “right spout” response against the ITD values for each of the 8 pulse pairs in the train, using the open source Python function statsmodels.discrete.discrete_model.Probit [47]. The Probit regression model takes the form:
$$ y=\varPhi \left({x}^T\;\beta \right) $$
(1)
Here, y is the probability that the animal will respond on the right, x is the vector of the eight ITD values of the pulses in the train plus an added 1 for the intercept, T is the transpose operator, and β is the vector of coefficients, or “weights” attributed to each of the pulses, which are estimated by maximum likelihood. Φ is the cumulative Gaussian normal distribution. The fitted model thus assumes additive effects of the weighted ITD of each pulse in the train, and the set of coefficients β represent the animal’s TWF. Here we use the terms Probit coefficient and temporal weight interchangeably.
Electrophysiology
ECoG recording apparatus
Acoustic stimuli were generated by RZ6 multi-I/O processor (Tucker-Davis Technologies, USA) and presented via a pair of custom-made speakers (AS02204MR-N50-R, PUI Audio, Inc.) fitted to the openings of the hollow stainless steel ear bars, which fixed the rat into a stereotactic instrument (RWD Life Sciences, China). The speakers were calibrated with a GRAS 46DP-1 microphone (GRAS Sound & Vibration A/S), and their transfer functions were compensated with an inverse filter to be flat over the range of 600 Hz to 20 kHz to ~ ± 3 dB.
Neural activity was recorded using a 61-channel ECoG array [48]. The flexible (~ 30 μm thin) ECoG array comprised 203-μm diameter circular electrodes arranged on an 8 × 8 square grid, with three of the four corner positions unoccupied, and a 406-μm spacing between neighboring electrodes. The array covered an area of 10.6 mm2.
The neural signal was captured through two Intan C3314 32 channel headstage amplifiers (Intan Technologies, USA) connected to a PZ5 neurodigitizer (Tucker-Davis Technologies, USA) and processed with an RZ2 bioamp processor (Tucker-Davis Technologies, USA). Python programs written by the authors were used to generate stimuli and save the recorded signals.
ECoG recording procedure
ECoG was recorded from the auditory cortex (AC). At first, rats were anesthetized as described in the ABR recording procedure in “Behavioral data revealed profound onset dominance”, and their scalp was shaved. Prior to ECoG recording, ABRs were tested again to make sure the ear bars were still in good position, followed by intraperitoneal injection of urethane (20%, 1 mL). If a toe pinch reaction was observed during the ECoG recording, an additional 1 mL of urethane was injected. The total amount of injected urethane was less than 7.5 mL/kg. Additionally, butorphanol (10 mg/mL, 0.2 mL/kg every 1–2 h, Richter Pharma AG, 4600 Wels, Austria) was subcutaneously injected for analgesia. A deep cut in the midline of the scalp was made, and the surgical field was exposed. Local anesthetic Lignocaine (0.3 mL, 20 mg/mL, Troy Laboratories Pty Ltd, Australia) was applied on top of the surgical area. A craniotomy was performed over the right, or, in most cases, both temporal cortices. From a point 2.5 mm posterior to Bregma, a line was drawn perpendicular to the sagittal suture to the temporal ridge, and the intersection of this line and the ridge was marked. The craniotomy area extended 5.0 mm posterior and 4.0 mm ventral from this intersection point, to allow the placement of an ECoG electrode array on the auditory cortex (primary auditory cortex, secondary auditory cortex dorsal area, secondary auditory cortex ventral area). A hole was drilled through the skull anterior to Bregma to fix a screw which served as a reference electrode to connect to the ground wire of the recording headstage amplifier.
After placing the ECoG electrode array on the AC, acoustic stimuli were presented to the rat. ECoG neural signals were recorded at 6 kHz sample rate. At the end of the recording experiments, the rats were euthanized with an overdose of Pentobarbital (1~2 mL, 20%, Alfasan International B.V., Holland).
ECoG data analysis
We attempted two approaches to analyze the electrophysiological responses, one “univariate” approach which used a regression model to try to account for neural response amplitudes observed at each individual recording site, and one “multivariate” approach which attempted to use recently developed population decoding analyses to reconstruct stimulus ITDs from single-trial population responses observed across the ECoG array. The multivariate approach turned out to be much more successful, which has interesting implications for the nature of the representation of perceived ITDs as we shall see below.
Univariate analysis: channel-wise regression
Our analysis of the responses recorded with these stimuli was based on the assumption that most ITD sensitive neurons in the central auditory pathway would be tuned so as to have a “preference” for ITDs pointing to the contralateral side, while a minority might have an ipsilateral preference, but very few should have tuning curves that are symmetric at either end of the ecological range of ITDs [49,50,51]. We further assumed that neural response amplitudes of contralaterally tuned units should consistently increase when contralateral leading ITDs are presented, irrespective of whether these contralateral ITDs occur at the first, second, or nth click. Similarly, for ipsilaterally tuned units, response amplitudes should consistently decrease when contralateral ITDs are presented. Under these simplifying assumptions, we can attempt to fit TWFs to the neural data using a simple multiple linear regression, which regresses response amplitude against the signs of the four ITDs in each stimulus.
This univariate analysis of ECoG voltage data was further based on standard methods for quantifying evoked response amplitudes from LFPs as follows. First, per channel, the signal was band-passed using a 4th order band-pass filter from 30 to 300 Hz (scipy.signal.butter(), scipy.signal.filtfilt()). This chosen frequency region covers the gamma to very-high-gamma frequency ranges which have previously been shown to correlate particularly highly with multi-unit responses of auditory cortical neurons [52]. The band-passed signal was downsampled by a factor of 4 to a sample rate of 1500 Hz (scipy.signal.decimate()) and the decimated multichannel data were denoised using the “denoising by spatial filtering” methods developed by [53]. The cleaned data were “re-referenced” by subtracting the median across all channels. Re-referencing to the median has been shown to make the re-referenced signal less susceptible to outliers [54]. Neural responses were then quantified by epoching the cleaned, re-referenced signals into data segments ranging from 1 to 30 ms post stimulus onset. This epoch was chosen by visual inspection to cover the onset response peak, and it is consistent with reports that the initial responses to acoustic stimuli in the rat auditory cortex peak approximately 20 ms after stimulus onset [55]. The epoched data were baseline-corrected by subtracting their mean [56], and the RMS amplitude was calculated for each response epoch. Outlier epochs with RMS amplitudes greater than three standard deviations above the median RMS amplitude were excluded from further analysis [57].
The distribution of RMS response amplitudes obtained in this manner was highly positively skewed. We log-transformed the RMS values to make them more suitable for linear regression analysis to obtain TWF values. Furthermore, we wanted to compute temporal weighting coefficients in normalized units which were insensitive to site-to-site or animal-to-animal variability in the range of observed voltage values that can result from variable electrode impedances or electrode placements. We therefore z-scored the log(RMS) values prior to regression analysis.
The transformed data were then subjected to an ordinary least squares (OLS) regression (statsmodels.api.OLS, [47]) with constant added (Eq. 2). The form of the regression model is
$$ y={x}^T\;\beta +\in $$
(2)
where y is the z-scored, log-transformed LFP amplitude observed in each trial, x is a vector of the regressors (the ITDs of the 4 clicks in ms, and the added constant to provide the intercept), β is the vector of regression coefficients (the neural TWF weights in units of standard deviations of log RMS LFP amplitudes per ms of ITD), and ε is an error term which, as usual for normal linear regression, is assumed to follow a Gaussian distribution. In addition to computing the regression weights β, the software returned p values indicating how likely it is that the corresponding β is significantly different from zero.
Multivariate analysis: population-based decoding
In addition to the mass-univariate (i.e., channel-by-channel) analyses described above, data were also subject to a multivariate analysis based on the response of the population of recorded neurons (i.e., pooling information from multiple channels). Rather than computing the “weight” of a given ITD in the train as a scaling factor that maps ITD values onto changes in response amplitude, the rationale of this analysis was to try to decode the ITD value of each click in the train on a trial-by-trial basis. This decoding analyzed the pattern of neural activity measured by multiple ECoG channels, and quantified the “weight” of each ITD in the stimulus train by how well the ITD value can be decoded from single-trial neural population responses. The decoding methods used were originally developed by [58, 59] as a means to analyze human EEG response data, and were recently adapted for the analysis of rat auditory cortex ECoG data [60]. To construct the decoder, we first selected channels that showed a robust evoked response to the click train. The criterion that we used for channel selection was based on the signal-to-noise ratio (SNR), defined for each channel as the ratio between the RMS of the signal in the first 30 ms after click train onset and the RMS of the signal in the last 30 ms prior to click train onset. Only channels with SNR > 3 dB were taken into the analysis. On average, 73.36% (SEM 8.22% across penetrations) met this criterion.
Following channel selection, for each ECoG array position, data from multiple channels were used to decode the ITD of each click in the train, one at a time, based on the RMS in the 0–30-ms time window following click train onset. To this end, we split the data into three sets: (1) the test trial itself, (2) the remaining trials with the same ITD value as the test trial, and (3) the remaining trials with a different ITD than the test trial. From these three sets, we obtained three vectors of average response RMS amplitude values concatenated across channels. We then calculated the multivariate Mahalanobis distance values between (1) the test trial vector and the average vector of trials with the same ITD, as well as (2) the test trial vector and the average vector of trials with a different ITD. The Mahalanobis distance values were scaled by the noise covariance matrix of all channels, i.e., the covariance based on single-trial residual RMS after removing the mean RMS from each trial [61]. The scaled Mahalanobis distance values, obtained for a given trial k relative to other “same” or “different” trials, were used to calculate the overall decoding distance metric according to the following equation:
$$ decoding(k)=\frac{distance\left(k, different\right)\hbox{-} distance\left(k, same\right)}{distance\left(k, different\right)+ distance\left(k, same\right)} $$
(3)
This procedure was carried out for each trial in turn in a leave-one-out cross-validation approach, and the resulting decoding values were averaged across trials to obtain ITD decoding estimates for each of the four clicks in the train. The decoding estimates were tested for statistical significance using a signed rank test. These signed rank tests were done for all electrode placements pooled together, as well as separately for 300 and 900 Hz click rates. The tests were corrected for multiple comparisons using Bonferroni correction. Furthermore, to plot representative examples of individual electrode placements, we calculated 99% confidence intervals of the observed individual decoding estimates by repeating the analysis over 1000 iterations, whereby in each iteration the ITD labels were randomly reshuffled, to obtain a surrogate distribution of decoding estimates.
We also explored whether neural activity later than the first 30 ms following click train onset can be used to decode click ITDs. To this end, we repeated the decoding analysis in a sliding time window approach, using a window length of 30 ms (with a time step of 5 ms). Specifically, for each time window, we extracted the RMS envelope (downsampled to 200 Hz to yield 7 RMS values per time window), de-meaned it by removing the average across the time window (separately for each channel), and concatenated the de-meaned values across channels [19]. The resulting vectors of RMS fluctuations in multiple channels were used to calculate the Mahalanobis distance metrics and the corresponding decoding estimates, as described above. Decoding time series were tested for statistical significance for each click pair and time point using a signed rank test, correcting for multiple comparisons using a false discovery rate of 0.01 [62]. Again, these statistical tests were applied for all electrode placements pooled together, as well as for 300 and 900 Hz click rates separately.