Animals and housing
Heterozygous Nlgn1+/− mice were obtained from Prof. Nils Brose, generated by homologous recombination of embryonic stem cells deleting exon sequences covering the translational start site and 546 bp of 5′ coding sequence of the murine Nlgn1 gene [80], and backcrossed more than 10 generations on a C57BL/6 background. Nlgn1−/− mice and WT littermate matched controls were generated at The Florey by mating heterozygous females and males. Mice were weaned at 3–4 weeks of age and housed in mixed genotype groups of 2–4 per cage with food and water available ad libitum. Bedding consisted of sawdust chips 2 cm deep and tissue paper for nesting material. At ~ 10 weeks of age, mice were moved from individually ventilated cages to open-top cages in a humidity and temperature-controlled holding room maintained on a 12:12-h reversed light/dark cycle (lights off at 07:00). Mice were acclimatized to these conditions for a minimum of 1 week prior to handing. Pre-training began at ~ 12 weeks of age. All behavioral testing was conducted during the dark active phase of the cycle, with the experimenter blinded to genotype during behavioral testing. All procedures were approved by The Florey Institute of Neuroscience and Mental Health Animal Ethics Committee.
Cohorts of mice used for behavioral testing
A total of 6 cohorts of mice were used in the present study (see Additional file 1: Fig. S1 for a schematic of sequence of tasks for each cohort). Cohort 1 (WT: n = 12 female/n = 15 male; Nlgn1−/−: n = 13 female/n = 13 male) was tested in the pairwise visual discrimination, reversal learning, object-location paired associate learning, and extinction learning tasks. When a single cohort of animals was tested on multiple touchscreen-based tasks, mice were placed back on free-feeding for ~ 2 weeks and baseline weights updated prior to commencing food restriction for the next task. Cohort 2 (WT: n = 14 female/n = 14 male; Nlgn1−/−: n = 14 female/n = 17 male) was tested in the fixed ratio task (FR1–40) with strawberry milk rewards and fixed ratio 20 (FR20) task with water rewards. Cohort 3 (WT: n = 11 female/n = 12 male; Nlgn1−/−: n = 7 female/n = 12 male) and cohort 4 (WT: n = 6 female/n = 4 male; Nlgn1−/−: n = 6 female/n = 4 male) were tested in the progressive ratio task, spontaneous locomotor activity, and accelerating rotarod tests. Cohort 5 (WT: n = 13 female/n = 10 male; Nlgn1−/−: n = 10 female/n = 12 male) was tested in the Porsolt forced swim test following ~ 2 weeks simple operant training for a different study not included in this paper. Cohort 6 (WT: n = 13 female/n = 16 male; Nlgn1−/−: n = 11 female/n = 16 male) was experimentally naive and tested for spontaneous locomotor activity. For all non-touchscreen-based tasks, mice were not food restricted when tested.
Rodent touchscreen operant tasks
Apparatus
Touchscreen testing was conducted in the Bussey-Saksida mouse touchscreen operant system (Campden Instruments Ltd., UK). Stimulus presentation, task parameters, and data recording were controlled through Whisker Server and ABET II Touch software (Campden Instruments Ltd., UK). The two-hole mask was used for the pairwise visual discrimination and reversal learning tasks, and the three-whole mask used for the object-location paired associate learning, extinction learning, fixed ratio, and progressive ratio tasks.
Touchscreen pre-training
Pre-training and food restriction were conducted as previously described [36, 37, 42]. Before testing, mice were first food restricted to 85–90% free-feeding body weight. Mice were then trained through five phases for instrumental conditioning to learn to selectively nose-poke stimuli displayed on the touchscreen in order to obtain a liquid reward (strawberry milk, Devondale, Australia; 20 μl rewards for all touchscreen tests). All animals received one daily session for all touchscreen testing. Mice were required to reach a set performance criterion for each phase before advancing to the next phase. Briefly, mice were habituated (phase 1, Habituation) to the touchscreen chamber and to consuming liquid rewards from the reward magazine or receptacle for two 30-min sessions (criterion = consume 200 μl of liquid reward freely available in the reward receptacle at each session). For phases 2–5, a trial did not advance until the reward was consumed. In phase 2 (Initial Touch) or the Pavlovian stage, a single visual stimulus was displayed on the screen for 30 s, after which the disappearance of the stimulus coincided with delivery of a reward (20 μl), presentation of a tone and illumination of the reward receptacle (criterion = 30 trials within 60 min). A nose-poke response to the stimulus during the 30-s window was rewarded with 3 times the reward amount to encourage responding. In phase 3 (Must Touch), mice had to nose-poke visual stimuli displayed on the screen to obtain a reward (criterion = 30 trials within 60 min). Mice then learned to initiate a new trial with a head entry into the reward receptacle (phase 4, Must Initiate, criterion = 30 trials within 60 min). In phase 5, responses at a blank part of the screen during stimulus presentation produced a 5-s timeout (signaled by illumination of the house light and no delivery of reward) to discourage indiscriminate responding (criterion = 21/30 correct responses within 60 min on 2 consecutive days). If another response to a blank part of the screen during stimulus presentation was made, there was a 5-s inter-trial interval (ITI), and then the same trial was repeated (the same stimulus presented in the same screen location, termed a “correction trial”) until the mouse made a correct response. Therefore, phases 2–5 consisted of 30 trials (pseudorandom first-presentation), and phase 5 also included an unlimited number of correction trials.
Pairwise visual discrimination and reversal learning
The pairwise visual discrimination (PD) and reversal learning (RL) tasks were conducted like that previously described [36, 37, 42]. Briefly, mice were trained to discriminate between two novel, equiluminescent visual stimuli (left and right diagonal stripes) displayed pseudorandomly across two locations with equal number of appearances at each location. Stimuli were 5 cm × 5 cm in size separated by 3 cm between stimuli and displayed 2 cm from the bottom of the touchscreen and ~ 5.5 cm away from the sides of the touchscreen. Response to one stimulus resulted in reward delivery (S+, correct response), followed by a pseudorandom trial (maximum 30 per session); response to the other stimulus resulted in a 5-s timeout, illumination of the house light followed by a correction trial. The same stimulus configuration was presented on correction trials until a correct response was made and a reward was delivered. Correction trials were not counted towards the trial limit or percentage of correct responses of a session. The designation of S+ and S− was counterbalanced within genotype and sex groups. Mice were trained to an acquisition criterion of ≥ 80% correct responses on two consecutive sessions. Following the acquisition of the visual discrimination task, mice were immediately moved on to the reversal leaning task, where the previously acquired reward contingencies were reversed. Reversal learning was assessed across 20 sessions.
Object-location paired associate learning
The object-location paired associate learning (PAL) task was conducted as previously described [36, 37]. Briefly, mice were trained to acquire reward associations jointly defined by visual stimuli (flower, plane, and spider) and their assigned correct spatial locations on the touchscreen (left, center, and right, respectively). Stimuli were 5 cm × 5 cm in size separated by 2 cm between stimuli and displayed 2 cm from the bottom and ~ 2.5 cm away from the sides of the touchscreen. For each trial, only two objects were presented: one object in its correct location (S+) and the other object in one of two incorrect locations (S−); therefore, there were six possible trial types. A nose-poke to the S+ resulted in delivery of a reward followed by a pseudorandom trial (maximum 36 per session), and incorrect responses resulted in a 5-s timeout followed by correction trial. Visuospatial learning in the PAL task was assessed across 40 sessions.
Instrumental extinction learning
The instrumental extinction learning task was conducted similar to that previously described [37, 42]. Mice were first trained to make a nose-poke response to a single white square displayed on the touchscreen (stimulus was 3 cm × 3 cm in size, displayed 3 cm from the bottom and ~ 10.5 cm away from the sides of the touchscreen) for a reward until reaching a set acquisition criterion (30 trials in < 12.5 min on five consecutive sessions). Following acquisition, instrumental extinction was assessed where responses were no longer rewarded (30 trials per session tested across 6 sessions). During extinction, the visual stimulus was displayed for 10 s on each trial and animals could either make a response or an omission.
Progressive ratio
Details on testing the touchscreen-based progressive ratio task have been described previously [81]. Briefly, mice had to make nose-poke responses to a single white square displayed on the touchscreen (stimulus was 4 cm × 4 cm in size, displayed 1.5 cm from the bottom and ~ 10 cm away from the sides of the touchscreen) for a reward. Naive mice first underwent phases 1 and 2 of touchscreen pre-training, followed by one session each of fixed ratio (FR) schedules of 1 (FR1), FR2, and FR3 and three sessions of FR5 training where a fixed number of nose-pokes (1, 2, 3, and 5 respectively) were required for a reward. Mice were required to complete 30 trials in 60 min in each of the FR sessions (criterion). Once training criterion was reached, mice advanced to the progressive ratio stage where the number of nose-poke responses required to obtain a reward incremented by 4 after every trial (1, 5, 9, 13, etc.,) until animals reach a breakpoint. If no responses to the touchscreen or entries to the reward receptacle were detected for 5 min, the session ended and the animal removed from the chamber. Mice were tested on 6 progressive ratio sessions.
Fixed ratios
Touchscreen-based fixed ratio testing was similar to that described for progressive ratio (mice had to make nose-poke responses to a single white square displayed on the touchscreen for a reward; stimulus was 4 cm × 4 cm in size, displayed 1.5 cm from the bottom and ~ 10 cm away from the sides of the touchscreen). Naive mice first underwent phases 1 and 2 of touchscreen pre-training followed by three sessions of FR1 and had to complete 30 trials within a 60-min session before advancing. During the next serial FR test stage, mice were given 60 min per session to make as many responses as they were willing to, and sessions did not terminate due to inactivity. Mice were tested on three sessions of FR1, FR5, FR20, and FR40 sequentially.
Fixed ratio with water rewards
Following the serial FR testing, mice were water-restricted with access to water limited to 1 h per day. Water-restricted body weights were maintained between 85 and 90% of free-feeding body weight. Mice were tested on a FR20 schedule where 20 nose-poke responses were required to deliver a water reward (20 μl) for three sessions. After each session, mice were returned to home cage and given 1-h free access to water.
Touchscreen latency measures
Across all our touchscreen tests, we assessed 4 latency measures (see Fig. 3a, Additional file 1: Fig. S4B). Initiation latency measures the time from the end of the inter-trial interval to trial initiation by head entry into the reward receptacle to commence a trial. Head entry triggers the presentation of stimuli. Stimulus-approach latency measures the time from exiting the reward receptacle to arriving in front of the touchscreen (breaking the front IR beam). Stimulus-selection latency measures the time from arriving in front of the touchscreen to nose-poking one of the stimuli on the touchscreen. Lastly, reward collection latency measures the time from delivery of the reward tone to head entry into the reward receptacle.
Non-operant behavioral tests
Spontaneous locomotor activity
Mice were assessed for spontaneous locomotor activity in a novel open-field arena (27.31 cm (L) × 27.31 cm (W) × 20.32 cm (H), Med Associates, St. Albans, VT, USA) using the Activity Monitor system and software (Med Associates, St. Albans, VT, USA). Animals were tested in darkness (to promote exploration) for 60 min to provide an adequate time window to capture the habituation of locomotor activity to a plateau level.
Accelerating rotarod
For motor coordination and learning on the accelerating rotarod, mice were exposed to three 5-min trials across 3 consecutive days (9 trials in total). Mice were placed on a rotating rod (Ugo Basile, Gemonio, VA, Italy) facing forward (against the rotating direction of the rod) before acceleration started. Subsequently, the speed of the rotating rod accelerated from 4 to 40 rpm and latency to fall off was manually recorded. Falls before the acceleration started were not recorded as failures. Passively rotating by clinging onto the rod was recorded as falls. Testing was conducted under low lighting settings (20 lx red light).
Porsolt forced swim test
Mice were individually placed into a beaker (13 cm diameter) with 1.6 L of water (23–25 °C) for a single 5-min session under ambient lighting (20–25 lx white light). Each session was video-recorded, and total mobility time throughout the 5-min session was measured (no time bins excluded). Scoring was obtained using the automated ForcedSwimScan software (CleverSys Inc., VA, USA) under previously optimized settings [82] eliminating the need for manual observer scoring.
Data analysis
Multi-session touchscreen choice data were analyzed with generalized (logistic) linear mixed models. This is motivated by (1) trial-by-trial binary nature of the data, (2) the need to estimate learning rates per time unit (session/trial), and (3) the non-linearity of learning curves. Touchscreen latency data across sessions were analyzed with quantile regressions to assess distribution-wide differences.
Effect size of task variables (stimulus location, session, correction trial, etc.), biological variables (genotype and sex), and interactions between a subset of variables (genotype × sex, genotype × session, etc.) on behavioral measures (accuracy, latencies, etc.) were estimated together with 95% confidence intervals (CI) and statistical significance using various two-level mixed-effect general, generalized linear models or quantile regression (StataCorp, TX, USA). Mice were treated as level 2 clusters and random intercepts. Binary performance measures (correct/incorrect response, response/omission) were analyzed trial-by-trial using the generalized linear latent and mixed models (GLLAMM) program [38] with a logit link function, whereby the effects of task variables were expressed as odds ratios with an odds ratio of 1 indicating no effect (e.g., an effect of session > 1 indicates response accuracy improves over sessions). Latency data were analyzed using quantile regressions with robust and clustered standard errors [83] from the 0.05 to 0.95 quantile at 0.05 steps to allow distribution-wide comparisons (see Additional file 1: Fig. S6), whereby effects of task variables were expressed as latency difference with 0 indicating no effect (e.g., an effect of genotype > 0 for a given quantile indicates Nlgn1−/− mice have longer latencies).
For spontaneous locomotor activity, ambulatory distance was analyzed with GLLAMM with a log link. Other performance measures were analyzed using a mixed-effects linear model if the performance measures were normally distributed or median regressions otherwise [83].
To analyze the effect of correction trials and reoccurring pseudorandom trials on accuracy, two additional binary variables were included in the models indicating whether a trial is a correction trial/reoccurring trial (correction trials were excluded in estimating the effect of reoccurring pseudorandom trials). Heteroskedasticity-robust standard errors adjusted for clustering within animals were used for all analyses.
Behavior simulation
Agent
A simple reinforcement learning agent learned the utility of an action following the classic Rescorla-Wagner rule [84]:
$$ {Q}_{t+1}(A)={Q}_t(A)+\alpha \bullet \left[{r}_t-{Q}_t(A)\right] $$
(1)
Here, Qt(A) is the learned utility of a given action A on trial t, α is the learning rate, and rt is the reinforcement received on trial t. Actions can have both positive and negative utilities (e.g., responding may result in rewards but also incurs effort). The net utility of a given action is given by the linear combination of its positive and negative utilities, the relative importance of which is controlled independently by βP and βN respectively:
$$ U(A)={\sum}_i\left[I\bullet {\beta}_P+\left(1-I\right)\bullet {\beta}_N\right]\bullet {Q}_i(A) $$
(2)
Here, U(A) is the net utility of action A, Qi(A) is the different positive or negative utilities of A, I is the indicator function:
$$ I={\mathbbm{1}}_{Q\left({A}_i\right)\ge 0}\left(Q\left({A}_i\right)\right) $$
(3)
Such that,
$$ I=\left\{\begin{array}{c}1\ if\ Q\left({A}_i\right)\ge 0\ \\ {}0\ if\ Q\left({A}_i\right)<0\end{array}\right. $$
Action selection is given by a softmax function on the net utilities of potential actions
$$ P(A)=\frac{e^{U(A)}}{\sum \limits_i{e}^{U\left({A}_i\right)}} $$
(4)
Here, P(A) is the probability of choosing action A, which depends on the net utility of A compared to that of alternative actions. For all our simulations, a choice is made between only two actions.
Simulations
Binary choice (two-armed bandit) task
The reinforcement learning agent learned to choose between a correct and an incorrect response for 30 trials per session across 20 sessions. The correct response was always rewarded, and the incorrect response never rewarded. Both correct and incorrect responding incurred a negative utility of − 1 representing the physical effort of responding.
Serial fixed ratio task
The agent was trained sequentially through FR1, 5, 20, and 40 for three sessions on each ratio requirement where it chose between responding or resting. Responding resulted in a reward if the ratio requirement was met (positive utility) and incurred a negative utility of − 1 representing the physical effort. Resting results in no reward but incurs a much smaller effort-related negative utility of − 0.2. Note that the designation of the alternative action as resting is arbitrary. The general idea is that an animal chooses between responding and other low-reward-low-effort actions. Time elapsed as the agent chose to either respond or rest, the session ended after 2700 timesteps roughly corresponding to a 2700-s or 45-min session.
Porsolt swim test
The forced swim test was simulated as a choice between swimming and resting. Swimming was initialized with a utility of 0 representing that the agent initially believed that swimming will lead to a neutral outcome, and a utility of − 1 represents the effort of swimming. Resting had a large negative utility of − 10 representing the possibility of drowning but incurred no effort. Every time the agent chose to swim, it received a reinforcement of − 9 thereby gradually learned by Eq. (1) that swimming did not markedly improve the situation therefore reduced mobility over time. For simplicity, the agent made 300 decisions over 300 timesteps roughly corresponding to a 5-min session.
Note the logic of the proposed model does not depend on the specific values of task parameters used for the simulations, which were chosen so that the behavioral simulations are quantitatively similar to the experimental data.