Large-scale genomic analysis of antimicrobial resistance in the zoonotic pathogen Streptococcus suis

Background Antimicrobial resistance (AMR) is among the gravest threats to human health and food security worldwide. The use of antimicrobials in livestock production can lead to emergence of AMR, which can have direct effects on humans through spread of zoonotic disease. Pigs pose a particular risk as they are a source of zoonotic diseases and receive more antimicrobials than most other livestock. Here we use a large-scale genomic approach to characterise AMR in Streptococcus suis, a commensal found in most pigs, but which can also cause serious disease in both pigs and humans. Results We obtained replicated measures of Minimum Inhibitory Concentration (MIC) for 16 antibiotics, across a panel of 678 isolates, from the major pig-producing regions of the world. For several drugs, there was no natural separation into ‘resistant’ and ‘susceptible’, highlighting the need to treat MIC as a quantitative trait. We found differences in MICs between countries, consistent with their patterns of antimicrobial usage. AMR levels were high even for drugs not used to treat S. suis, with many multidrug-resistant isolates. Similar levels of resistance were found in pigs and humans from regions associated with zoonotic transmission. We next used whole genome sequences for each isolate to identify 43 candidate resistance determinants, 22 of which were novel in S. suis. The presence of these determinants explained most of the variation in MIC. But there were also interesting complications, including epistatic interactions, where known resistance alleles had no effect in some genetic backgrounds. Beta-lactam resistance involved many core genome variants of small effect, appearing in a characteristic order. Conclusions We present a large dataset allowing the analysis of the multiple contributing factors to AMR in S. suis. The high levels of AMR in S. suis that we observe are reflected by antibiotic usage patterns but our results confirm the potential for genomic data to aid in the fight against AMR. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-021-01094-1.


Background
The ability of bacterial pathogens to evolve resistance to antimicrobials is one of the gravest threats to human health and food security worldwide. Antimicrobial resistance (AMR) to a given drug can be quantified via the minimum inhibitory concentration (MIC), i.e. the minimum concentration of the drug that is sufficient to inhibit growth of a bacterial culture. While, for practical use, bacterial isolates are often categorised as either susceptible or resistant, MIC is a continuously varying trait (measured discontinuously). In this study, we investigate how AMR genes, variants and ecology explain antibiotic phenotype in the bacterium Streptococcus suis, treating MIC phenotype as a quantitative trait.
S. suis primarily exists as a commensal in pigs, colonising the nasopharynx, gut and vagina, but it also causes systemic and respiratory disease, particularly in young pigs [1]. S. suis is also a serious zoonotic disease, being the leading cause of adult bacterial meningitis in Vietnam [2]. While some autogenous vaccines are used in pig production, they are serotype-specific and give inconsistent cross protection against heterogeneous S. suis [3]. Antimicrobials therefore remain the standard treatment for S. suis, and as such, S. suis is a leading driver of antimicrobial usage in pig farms [4].
As well as being a serious problem in itself, S. suis also has unique benefits as a model for studying AMR. By weight, more pork is consumed globally than any other meat [5,6], and S. suis is found in most, if not all, pigs [7]. Furthermore, antibiotic consumption is higher in pigs (172 mg per population corrected unit) than any other livestock (e.g. cattle (45 mg) and chicken (148 mg)) [8]. As a result, most S. suis lineages will experience antibiotics. These include not only antibiotics administered directly against S. suis, whether as a therapeutic, prophylaxis or metaphylaxis, but also, and perhaps more commonly, in response to many other bacterial infections and in limited countries as growth promoters [9,10].
The strong selection pressure caused by widespread use of antimicrobials in pig farming is expected to give rise to AMR in S. suis. Consistent with this, several phenotypic studies show high MICs for each of the major classes of antibiotics in one or more S. suis collections [11][12][13][14][15]. In addition, there have been demonstrations of individual resistance determinants affecting MIC in S. suis [16][17][18], and also some mining of S. suis genome collections for known resistance determinants [19][20][21]. However, to our knowledge, no study has combined complete genomic and phenotypic information in large numbers of isolates, collected from a diverse range of populations.
To this end, we obtained replicated measures of MIC for 16 antibiotics, each widely used in pigs, for six different S. suis collections, comprising 678 isolates. These collections were chosen to include the three main pigproducing regions of the globe-namely the Americas, South East Asia and Europe-and targeted both human and pig hosts, as well as a range of years, serotypes and clinical phenotypes [22][23][24]. We obtained whole genome sequences for each of our isolates, allowing us to compare phenotype and genotype on an unprecedented scale.

Measurement of MIC for 16 antibiotics in S. suis
We scored 678 isolates of Streptococcus suis for replicated measures of Minimum Inhibitory Concentration (MIC) for 16 antibiotics (Additional file 1: Table S1). Four of these are beta-lactams (amoxicillin, cefquinome, ceftiofur and penicillin), which are typically used to treat S. suis infection in pigs. The other antibiotics are all widely used in the pig industry, but found in different drug classes. They comprise macrolide-lincosamidestreptogramin B (MLS B : erythromycin, lincomycin, tilmicosin and tylosin), tetracyclines (doxycycline and tetracycline) and fluoroquinolones (enrofloxacin and marbofloxacin), plus one each of an aminoglycoside (spectinomycin), a pleuromutilin (tiamulin), trimethoprim (TMP) and a phenicol (florfenicol). Results, shown in the left-hand plot of Fig. 1, reveal wide variation among our isolates in MIC for most of the antibiotics. For a few antibiotics, especially the MLS B and tetracyclines, the MIC values are clearly bimodal, suggesting a meaningful division between 'resistant' and 'wild-type' isolates. However, for most antibiotics, the distributions are either roughly lognormal (fluoroquinolones and phenicols) or positively skewed (spectinomycin, pleuromutilin, TMP and beta-lactams). It is striking that the distributions for the beta-lactams are not clearly dissimilar from any of the other antibiotic classes, despite their being the most common treatment against S. suis. Given these distributions, and the general lack of defined clinical breakpoints for these antibiotics in S. suis [25], we treated MIC as a quantitative trait in our subsequent analyses.

Ecological and genomic predictors of MIC
Our S. suis isolates were collected at different times, and from different countries, hosts and body sites, including sites associated with respiratory and systemic disease. Isolates were also genetically heterogeneous, and differed in their serotype; including serotypes with a known disease association (see 'Methods' and Additional file 1: Table S1). We first used linear models, to ask whether MIC levels varied systematically with these ecological and genomic factors.
S. suis is highly recombining, such that no single genealogy describes its diversity [22]. As such, to characterise genomic diversity in our sample, we identified 30 BAPs clusters of similar strains (see 'Methods'). Figure S1 shows a core genome phylogeny with these clusters highlighted (Additional file 2: Figure S1) [26][27][28]. The resulting clusters were a highly significant predictor of MIC (p < 10 −15 ; Table 1 (A)) showing that genomically similar isolates tend to have similar MIC values. To account for this genetic structure, 'cluster' was included as a random effect in subsequent analyses.
A model including all potential predictors showed that country of isolation was also a highly significant predictor of MIC. In addition, there were weaker effects of year of collection (with a slight trend for increasing MIC over time); for serotype (with non-disease-associated serotypes tending to have higher MIC); and for host disease status (with non-clinical isolates having higher MIC) ( Table 1 (B)).
Separate analyses for each antibiotic, showed that the effect of country was driven by consistently higher MIC in the samples from Canadian pigs. As shown in Fig. 2A (left-hand panel), this applied to 15/16 antibiotics, and also applied to various models (Table 1 (C)), and subsets of the data (Additional file 2: Figure S2). For example, the effect is seen consistently in systemic pathogens, respiratory pathogens and non-clinical isolates (Additional file 2: Figure S2). By contrast, the other predictors (year, serotype and clinical status) had effects that were less consistent or robust ( the MIC values between Canada and the UK is consistent with higher antimicrobial usage reported in Canada [29,30]. Predictions for Vietnam are more difficult, as there is a lack of official records of antibiotic usage, although it is likely that usage is both higher than in the EU and focussed on different drugs [31]. However, as shown in Fig. 2B (left-hand panel), we saw no such signal in the MIC data, even when we restricted the comparison to the genetic cluster BAPS4 that includes all of our Vietnamese sample (see 'Methods'). The clearest pattern is for lower MIC for the MLS B class in the Vietnamese isolates (pink points in Fig. 2B left-hand panel). As shown in Fig. 2C (left-hand panel), we also found no robust differences between Vietnamese isolates sampled from humans and pigs. This is consistent with pigs being a reservoir for this zoonotic disease, and the lack of evidence of genomic adaptation to human hosts [22].  Figure S1), and the 112 BAPS4 isolates from the UK and Canada. C the 22 Vietnamese isolates from pigs, and the 28 Vietnamese isolates from people. Left-hand panels compare the mean log MICs for each antibiotic. Consistent deviations from the dotted 1:1 line suggest consistently higher or lower MICs in that subset of the data (so the rightward shift in panel A shows that MICs are consistently higher in Canada). Right-hand panels show the proportion of isolates that carry each of the 43 candidate AMR determinants. Each point or bar is coloured according to its drug class, according to the colour scheme in Fig. 1 Identification of candidate AMR determinants within S. suis genomes Using a candidate locus approach, we next identified known and putative determinants of AMR that were present in our sampled genomes. We considered both the presence/absence of whole genes, and sequence variants within genes. In total, we detected 43 determinants: 10 completely novel variants of known resistance genes, 12 known resistance genes detected in S. suis for the first time, and 21 previously reported in S. suis (Table 2 and Additional file 1: Table S1). Our candidates come from three sources: the antimicrobial resistance database CARD [40], previously published S. suis variants not present in the CARD database [36,38,39] and novel variants detected using a subset of our data (see 'Methods').
Of the 10 novel determinants, six are variants in genes previously associated with resistance in other bacteria (including a promoter variant). In particular, we discovered four novel haplotypes within the loop region of the central transpeptidase domains of the pbp2B and pbp2X genes and within the signal peptide region of the mraY gene (encoding the enzyme, acetylmuramoylpentapeptide-transferase, essential to cell wall biosynthesis) that were associated with variation in beta-lactam MIC ( Table 2). In particular, we noted a PBP2X mutation at the conserved location T551, similar to that found in S. pneumoniae (T550), group B streptococci (T555) and S. pyogenes (T553) PBP2X proteins [41] indicative of a shared overall mechanism mediating betalactam resistance across the genus. Polymorphisms in mraY have also been identified in a genome-wide association study of beta-lactam resistance in S. pneumoniae [42]. In addition to these four beta-lactam haplotypes, two variantsa variant of the chromosomal dihydrofolate reductase gene dhfr and its promoterwere associated with reduced susceptibility against trimethoprim (Table 2). Furthermore, all of these six variants were independently associated with changes in MIC in at least seven different genetic clusters (Additional file 2: Table  S2) implying that the association between these variants and MIC is either directly causal or compensatory to the causal variant [43].
The remaining four novel variants were whole genes with homologies to known resistance genes: three located on mobile genetic elements (MGEs) and one chromosomal ( Table 2). Three novel aminoglycoside resistance genes, carried by isolates with high spectinomycin MIC, were identified based on aminoglycoside resistance determinant protein homologies. Although previously undescribed, we found homologues in other bacteria using a blastn search of the non-redundant database in GenBank (Table 2). We also characterised a chromosomally-encoded vgaC homologue (37% protein homology to vgaC), encoding an ABC-F ATP-binding cassette ribosomal protection protein, that we designate vgaF. This gene has arisen in 13 different BAPS clusters, each time associated with reduced susceptibility to tiamulin (Additional file 2: Table S2).
Overall, most of our candidate determinants (28/43) were found in fewer than 5% of the isolates, but ermB (which confers resistance to the MLS B class) and tetO (which confers resistance to tetracyclines) were present in a majority of the isolates ( Table 2). Only about a quarter of isolates carry a candidate determinant for the beta-lactams highlighting the continuing susceptibility to these first line treatment drugs in S. suis.
Nevertheless, as shown in Fig. 3, many isolates carried multiple determinants, and multidrug resistance was widespread. Around 40% of isolates carried resistance determinants to three or more classes of drug (275/678), and over 10% carried resistance determinants to five classes (81/678); this is more than carried no determinants at all (59/678).

Candidate AMR determinants explain most of the variation in MIC
We next investigated how well our candidate variants explain the variation we observe in MIC (see Additional file 2: Figures S3-S9 for detailed plots). The right-hand panel of Fig. 1 compares the distribution of MIC values for isolates carrying one or more of the candidate AMR determinants for that antibiotic class (denoted '1+'), to the MIC of the remaining isolates, which carry no such determinant (denoted '0').
The proportion of the variation in MIC explained by our variants varied between antibiotics. Explanatory power was strongest (r 2 > 0.75) for antibiotics where the distributions of MIC were most clearly bimodal (MLS B and tetracyclines). By contrast, we found no association between our variants and MIC for the fluoroquinolones and phenicol, where the distribution of MIC values was roughly lognormal, consistent with our isolates representing a 'wild-type' population. For the remaining antibiotics, our determinants had intermediate explanatory power (Fig. 1), but there were very few isolates with high MIC that did not carry a candidate determinant. This suggests that we have detected most of the causal variants in our dataset, removing the need to perform additional genome-wide associations.  We next asked whether the variable presence of our candidate determinants in different ecological settings would explain the significant ecological predictors of MIC in our linear models (Table 1). Consistent with this hypothesis, the typical MIC level in each genetic cluster was highly correlated with the frequency of determinants in that cluster (Additional file 2: Figure S10).
Differences between countries were explained in the same way. The higher MIC in Canada than the UK was due to consistently higher frequencies of the same determinants ( Fig. 2A right-hand panel)-with the exception of trimethoprim (see below). By contrast, tetracycline resistance throughout the BAPS4 cluster, though at similar levels in all three countries, was conferred by different determinants in Vietnam (Fig. 2B).

Ineffective variants and epistasis
While our data contained few isolates with high MIC that did not carry a candidate determinant, there were many isolates with low MIC despite carrying a determinant (Fig. 1).
In some cases, this was due to previously identified candidate genes that had no appreciable effect on MIC in our data. For example, spectinomycin MIC was unchanged by some determinants identified from the CARD database (e.g. ant(6')-Ib in Additional file 2: Figure S7). If we allow for the presence of these non-functional genes, the r 2 value for spectinomycin increases from 0.37 (Fig. 1) to 0.69 (Additional file 2: Figure S11a).
In other cases, as is well known, variants act against only some of the antibiotics in a class. In MLS B , for example, modification of the ribosomal target confers cross resistance to macrolides and lincosamides, while mechanisms such as efflux and enzymatic inactivation do not [44]; so msrD acts against erythromycin, but not against lincomycin, while linB inactivates lincomycin, but not erythromycin (Additional file 2: Figure S4). Again, taking this into account further increases predictive power (Additional file 2: Figure S11a).
As well as these simple cases, there were some clear examples of genetic interactions. For trimethoprim, two candidate variants-in the protein DHFR102 and its promoter (Table 2)-were found at higher frequency in the UK than Canada (Fig. 2A, right-hand-panel), but in Canada, the two variants were more often found together, leading to higher MIC overall ( Fig. 2A; Additional file 2: Figure S9).
Finally, we observed complex patterns for the second most common determinant in our dataset, ermB. For the 334 isolates that carried only ermB (and no other candidate determinant to MLS B class drugs), MIC values were clearly bimodal, with many carriers having very low MIC (Additional file 2: Figures S4 and S12). Further analysis revealed a small number of isolates with frameshift or premature stop codons (4/326 isolates with complete sequences in our assemblies). While the remaining isolates with low MIC carried rare amino acid variants at one of four positions (T75X, N100S, R118H and V226I)-the last three of which differentiate the ermB sequences found in Streptococcus pneumoniae from Clostridium perfringens [45]. Nevertheless, no particular sequence was always associated with low MIC, suggesting some unidentified source of epistasis (see 'Methods' and Additional file 2: Figure S12 for full details).

Resistance to beta-lactams
While beta-lactams are the major treatment class for S. suis, it is notable that the explanatory power of our variants seems to be weaker for this drug class than for, e.g. tetracyclines or MLS B .
In Streptococci, a typical route to beta-lactam resistance involves variants in the pbp genes, and it is well established that the joint action of many pbp variants is necessary to explain substantial changes in MIC [46,47]. Single-point mutations in pbp2x, for example, cause very modest elevations in S. pneumoniae [46], group B Streptococci [48], Streptococcus dysgalactiae subsp. equisimilis [49] and Streptococcus pyogenes [41].
Like S. pneumoniae, S. suis has three key pbp genes, pbp1A, pbp2b and pbp2x, and shares a similarly broad MIC distribution (although with typically lower MIC values) [47,50,51]. Our data also show that mutations in pbp2x alone have small effects (Additional file 2: Figure S3), while genotypes carrying four or more variants have the highest mean MIC (Additional file 2: Figure  S3). Our explanatory power increases greatly when we predict beta-lactam MIC from the total number of candidate variants carried (Additional file 2: Figure S11b).
Although official breakpoints are lacking for many betalactam antibiotics in S. suis, for penicillin, only the isolates with the most variants reach clinical significance (i.e. penicillin resistance ≥ 1 or log(0) MIC).
In S. pneumoniae, it has also been noted that mutations conferring resistance to beta-lactams occur in a set order, with amino acid changes in PBP2B and PBP2X often acting as the first step [46,52,53]. Figure 4 shows that our data also show this characteristic 'nested' pattern (see also Additional file 2: Figure S13 [28];), with PBP2X mutations largely occurring in backgrounds containing PBP2B mutations and MraY mutations occurring in backgrounds containing both PBP2B and PBP2X mutations.
It is notable that mutations in MraY-which are generally last to occur, and never occur alone-might have a compensatory effect. Altered PBPs, while conferring resistance, might be less active transpeptidases than their wild-type counterparts [54]. For example, the amidotransferase enzyme encoded by murT enables crosslinking of cell wall peptidoglycans by preparing lipid II used by pneumococcal PBPs [55,56]. Mutations in MraY might therefore compensate for reduced enzymatic activity of altered PBPs.
Studies have shown that cefotaxime, a third-generation cephalosporin like ceftiofur, selectively inactivates PBP2X but not PBP2B [57]. Our results echo this pattern because our PBP2B haplotype has a smaller change on mean MIC of ceftiofur than penicillin (Additional file Fig. 4. Beta-lactam resistance determinants have additive effects and appear in a consistent order. The left-hand panel shows that the candidate determinants against the beta-lactam drug class often appear in a consistent order. For example, the mutation at site 551 of PBP2X tends to appear in backgrounds that already carry mutations in PBP2B. The plot was generated according to the method and plotting convention of [28], where each row represents an isolate, and those isolates that fit the nested pattern are shown in pink. Results show that 93.5% (634/678) of our isolates fit the nested pattern. The right-hand plots show the log MIC values for these 634 isolates, comparing isolates carrying different numbers of candidate determinants. Figure S3), and the explanatory effects of our PBP haplotypes differ between the drugs (Fig. 1). Previous variants in PBP2X thought to confer cefuroxime resistance [38] and, while validated by our data, were only found at very low frequency in our collection ( Table 2).

Discussion
We have examined antimicrobial susceptibility of Streptococcus suis to 16 antibiotics that are commonly used in pigs. Replicated measures of MIC suggest that, even when some isolates have very high MIC, there is often no natural separation into 'resistant' and 'susceptible' (Fig. 1). This highlights both the need for clinical breakpoints in S. suis, and the need to evaluate AMR with quantitative measurements of MIC. This is particularly important for beta-lactam resistance, given its multi-allelic nature (Fig. 4, Additional file 2: Figure S3), and the phenomenon of 'MIC creep' that may eventually lead to treatment failures (e.g. vancomycin resistance in MRSA in some regions [58,59]).
We have also shown that AMR levels differ systematically between countries. For example, compared to the UK, Canada has consistently higher MICs, and frequencies of the same candidate AMR determinants (Fig. 2, Additional file 2: Figure S1). Differences in antimicrobial usage are the most plausible explanation for this trend, because usage is much higher in Canada than in the UK [29,30]. In particular, the ban on antibiotics used as growth promoters came into effect in Canada only in 2018 versus 2006 in the UK. However, our results are correlational and other factors could be explanatory, e.g. different microbiome compositions in pigs from different areas.
Next, we have identified 43 candidate resistance determinants and shown that they explain the majority of the observed variation in MIC for 11/16 antibiotics, and a substantial fraction of the variation for a further 2/16 (Fig. 1).
While many of these variants were previously described, and most were found at low frequency (Table 2; Fig. 2), we detected three novel candidates that are common and associated with clinically relevant changes in MIC: vgaF, DHFR102 and the dhfr promoter. A new mechanism for tiamulin resistance, such as the vgaF, could have implications for treatment success because tiamulin is a common drug for treating infections in livestock, including pigs [60]. High-level trimethoprim resistance was associated with mutations in the promoter regions of the dhfr gene. This is in contrast to the folA (I100L), folP 1-2 codon insertion combination conferring high-level resistance to trimethoprim/sulfamethoxazole (TMS) in S. pneumoniae [61]. While our MICs were for trimethoprim, our observations suggest the evolution of divergent mechanisms of resistance against TMS between these species. If these resistance mechanisms are functionally verified, they should be included in routine AMR gene testing for S. suis.
While our candidate variants were explanatory, we also found several complexities in the genotypephenotype map. These include epistatic interactions, where the effects of some candidate genes vary with their S. suis genetic background. Phenotypic reversion (i.e. mutations in transcriptional regulators or elsewhere in the genome, counteracting a resistance determinant) can be common if resistance has a fitness cost [62]. In addition, resistance genes have been shown to be transcriptionally silenced [63]. These processes could explain why we see many isolates carrying candidate determinants but with low MIC (Fig. 1). This same pattern is also common in other bacteria and up to 10% of S. aureus isolates [64][65][66][67][68]. These complexities have consequences for diagnostic investigation of AMR of S. suis using whole genome sequences. While the predictive power is quite high for many antibiotics, we should expect many type I errors (false positives). These results also caution against using the presence of AMR determinants as a measure of resistance more generally.
While susceptibility to penicillin in clinical cases of S. suis remains high, we see substantial variation in betalactam MIC (Fig. 1A). Consistent with studies in other Streptococci, our results show incremental and ordered changes in amino acids, first within PBP2A and then PBP2X that lead to clinically relevant elevations in MICs. Accurate predictions of genotype-phenotype have already been developed in S. pneumoniae [47,61,[69][70][71], group B Streptococci [72] and S. pyogenes [41]. With no effective treatment for S. suis other than antibiotics, developing similar models and monitoring of these variants in S. suis populations should be a priority.
Two final aspects of our results highlight the fact that while individual battles against AMR are effective, AMR is not always a problem that can be tackled one disease at a time.
First, we have found high levels of resistance in S. suis even to antibiotics that are not typically used to treat this infection, including high rates of multidrug resistance (Fig. 3). This trend could be partly due to the cooccurrence of resistance genes on MGEs. However, determinants for beta-lactams (found in 190/678 isolates)-which are the primary treatment-were less common than those for tetracyclines (574/678), MLS B (395/ 678), trimethoprim (246/678) and tiamulin (193/678). This pattern probably reflects a 'bystander selection effect', common in opportunistic pathogens that are part of the healthy microbiota and frequently exposed to antibiotics used as therapeutics or growth promoters (although antimicrobial growth promoters are now banned in the UK, Vietnam and Canada). However, we note that beta-lactams are prescribed at similar (if not higher frequencies) than many other drug classes in pigs making the pattern of fewer beta-lactam determinants particularly surprising.
Second, many of the rarer AMR genes in our sample were not previously known in S. suis, but are common in other bacteria, both gram negative and gram positive ( Table 2). Together with the presence of non-typical Streptococcal ermB in some isolates, this raises the possibility of S. suis acting as a reservoir for AMR determinants in other bacteria. Indeed S. suis shares conserved chromosomal insertion sites with many human pathogens, such as S. pyogenes, S. pneumoniae and S. agalactiae [19]. Given that S. suis causes human clinical disease, a direct exchange of AMR genes between S. suis and other human pathogens is plausible. Our results lend further support to a one health approach to tackling AMR.

Conclusions
We present a comprehensive dataset comprising genomes, MIC and metadata allowing the first large-scale analysis of the multiple contributing factors to AMR in S. suis. Overall, the high levels of AMR that we observe are reflected by antibiotic usage patterns in pigs. First, S. suis is resistant to many classes of antibiotics that are not typically used to treat S. suis infection. Second, MIC and AMR determinant prevalence is significantly higher in Canada than the UK where antibiotic usage is lower. This indicates that the ongoing effort to reduce antimicrobial use in livestock worldwide may be effective for reducing AMR in S. suis. Our results also highlight some interesting complications in the genotype to phenotype map in S. suis but overall, the explanatory power we observe confirms the potential for genomic data to aid in the fight against AMR.
In the UK, isolates came from three different collections. The first collection in 2009-2011 sampled nonclinical and clinical isolates from pigs across England and Wales (described in Weinert et al. [22]). The second collection in 2013-2014 sampled non-clinical isolates from five farms (described in Zou et al. [24]). The third collection during 2013-2014 targeted clinical isolates from pigs across England and Wales (described in Wileman et al. [23]). In pigs that showed clinical symptoms consistent with S. suis infections (e.g. meningitis, septicaemia and arthritis), the site of isolation was classified as 'systemic' if recovered from systemic sites. The site of recovery was classified as 'respiratory' if derived from lungs with gross lesions of pneumonia. S. suis isolates from the tonsils or tracheo-bronchus of healthy pigs or dead pigs without any typical signs of S. suis infections were defined as 'non-clinical'. Isolates that could not confidently be assigned to these categories (e.g. a tonsil isolate from a pig with systemic signs) were classified as unknown. Altogether, the UK isolates were classified by clinical status as 'systemic' (n = 94), 'respiratory' (n = 50), 'non-clinical' (n = 197) or 'unknown' (n = 82), respectively.
The Vietnamese isolates were collected to sample related populations from human and pig (described in Weinert et al. [22]). These comprised 'systemic' isolates (n = 28) from human clinical cases of meningitis from provinces in southern and central Vietnam, and 'systemic' (n = 4) or 'non-clinical' isolates (n = 18) from pigs, collected between 2000 and 2010. These isolates were exclusively serotype 2 or serotype 14 and belong to one genetic population (Additional file 1: Table S1).

Antimicrobial susceptibility testing
The minimum inhibitory concentrations (MIC) for a range of antibiotics were determined by the micro-broth dilution method, which was performed and the results interpreted in accordance with CLSI Approved Standards, M100-S25 (2015), Vet01S 3rd Edition (2015) and de Jong et al. [11,73]. MIC measurements for some of our isolates were previously published [11]. For the remaining MIC measurements, the MICs were determined for sixteen different antimicrobial compounds, representing nine antimicrobial classes at LGC, Fordham, UK (formerly Quotient Bioresearch, Fordham, UK), for the UK and the Canadian isolates. MIC testing of antibiotics for the Vietnamese isolates was performed at the Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam, in collaboration with the Department of Veterinary Medicine, University of Cambridge, UK.
Whole genome sequencing, assembly and inference of population structure Genome DNA extractions and whole genome sequencing of newly sequenced S. suis in this study were as previously described by Weinert et al. [22]. Briefly, single colonies of strains were grown up in broth culture, DNA was extracted using DNeasy kits (Qiagen), Illumina library preparations were performed as described by Quail et al. [74] and the whole genomes sequenced on the HiSeq2000 according to the manufacturer's instructions (Illumina, San Diego, CA, USA) at the Welcome Trust Sanger Institute, Cambridge, UK. Sequencing generated 125 bp paired end reads, which were assessed for quality using Sickle [75] after the removal of adapter sequences.
Sequencing reads that passed the quality threshold were put forward for de novo assembly generation using Spades v.3.10.1 [76] utilising a variety of parameters conditions. Our 678 isolates were combined with 401 additional genomes given in Additional file 3: Table S3 [27] to increase robustness of S. suis population structure estimation, although these isolates were unavailable for MIC testing. Draft genomes were annotated using Prokka v1.12-beta (v2.8.2) [77], and bacterial species assignment was performed by a combination of MLST assignment and FastQ Screen v.0.11.1 using a custom database [78]. We mapped the Illumina reads back to the de novo assembly to investigate polymorphic reads in the samples (indicative of mixed cultures) using BWA v.0.7.16a [79].
Genomes that exhibited poor sequencing quality (i.e. poor assembly as indicated by a large number of contigs (> 2.5% of the genome assembly in contigs less than 1 kb) an N50 value of less than 10 kb or a high number of polymorphic reads (> 2000 SNPs)) or that which were inconsistent with an S. suis species assignment were excluded from the analysis.
In order to group our S. suis isolates into different genetic clusters, we used the programme hierBAPs in R [80,81]. First, we inferred core genes from our isolates using Roary (v2.8.2) [82], aligned them using MACSE [83] and stripped regions that could not be aligned unambiguously due to high divergence, indels or missing data. This conserved region of the core genome was used as input in hierBAPS and to produce a consensus neighbour-joining phylogenetic tree using the K80 model in the R package ape (Additional file 2: Figure S1) [81,84].

Known AMR determinant detection
ARIBA [85] identifies AMR determinants (or any sequence of interest) directly from paired sequencing reads using a public or custom reference database and relies on mapping the reads to reference sequence clusters followed by the formation of local assemblies of the mapped reads. The tool is further able to confirm the intactness of resistance genes to identify known or novel SNPs within a gene of interest.
The AMR determinants in S. suis were identified using ARIBA v2. 10.0 based on the public AMR database, CARD [40], which was further supplemented with a custom database. The custom database contained gene sequences not present within the CARD database at the time of testing. These were previously published S. suis AMR genes, single-nucleotide polymorphisms (SNPs) in genes known to confer antibiotic resistance and AMR genes found in other bacteria. In addition, we investigated whether there were additional novel resistance variants in known resistance genes (described below). The sequence identity threshold against a reference was set at 90%. Only paired end sequence reads that passed the quality control thresholds were used as input for ARIBA. To identify genes falling under the 90% identity threshold, we also performed blast searches of the draft assemblies against the non-redundant NCBI protein or nucleotide databases to identify variants or chimeric alleles of known resistance determinants that might not be present in the CARD database. Examples of allelic variants identified this way include tet, aad, ant and cat (Table 2).

Novel AMR candidate determinant detection
We identified novel resistance determinants by a range of methods. Candidate determinants that might encode novel resistance mechanisms were identified by scanning literature describing experimental studies and others found by genome-wide association studies (GWAS) in related bacterial species, for example, variants of pbps, mraY, dhfr and folA [46,[86][87][88] along with their promoter regions. To avoid over-fitting our generalised linear models, gene or promoters of interest in a 'training' subset of the collection (n = 205) were then extracted, aligned and ranked from the highest to lowest MIC values, using MUSCLE [89] in SEAVIEW [90]. Manual sequence analysis was then performed to identify either amino acid or nucleotide variations that associated with high MIC.
Kinetic and structural studies have previously established that beta-lactam resistance is conferred by substitutions within PBPs in Streptococcus pneumoniae [46,91] and other streptococci [49,92,93]. While mutations are present throughout the entire PBPs, we noted statistically significant mutations (Additional file 2: Table S2) in altered PBP2B and PBP2X proteins in strains exhibiting high penicillin (≥ 1 mg/L) and ceftiofur MICs (≥ 2 mg/L), respectively. In PBP2B, the residue variations were present in loop regions, K479T/A, D512E/Q/K/A, K513E/D and T515S, within the S. suis transpeptidase domain (residues 351-681). In PBP2X, the mutations were located at positions M437L, S445T, T467S, Y525F and T551S, also in loop regions within its catalytic domain (residues 265-619). No statistically significant variations were found within PBP1A, although the variant P405T was shared by some of the BAP clusters. As multiple amino acid variants were identified within PBP2X and PBP2B, many of which were in the same genomic region (Table 2), we grouped the variants in to haplotypes. We did this by scoring haplotype presence if the isolate had all of the amino acid variants associated with high MIC in a given gene.
In addition to allelic differences in PBP2B and PBP2X, we also detected mutations in mraY. Found immediately upstream of pbp2X, mraY encodes an acetylmuramoylpentapeptide-transferase enzyme which is essential to lipid cycle reactions in the peptidoglycan cell wall biosynthesis pathway. Residue substitutions were present at A4S/T, M6I/L and G8S within the signal peptide regions of the protein. These substitutions were in strong linkage disequilibrium, such that they almost always occurred in pairs. In particular, both A4S/T and G8S occurred only in isolates also carrying M6I/L, while M6I/L appeared alone in only a single isolate. As such, we scored each isolate as carrying the mraY determinant only if it carried two of these three amino acid substitutions (either A4S/T and M6I/L, or G8S and M6I/L; Table 2). This set of isolates differed significantly in their MICs in both penicillin and ceftiofur across multiple BAP groups (Additional file 2: Table S2).
Not all isolates with high tiamulin MICs possessed a lsaE gene, suggesting an additional mechanism in S. suis conferring resistance to pleuromutilins. Using the CARD RGI (resistance gene identifier) protein homology models, a vgaC homologue was identified as a candidate gene associated with variation in tiamulin MIC. The gene, designated as vgaF in this work, encodes an ABC-F ATP-binding cassette ribosomal protection protein and shares 37.14% homology to the reference. Located within the chromosome, vgaF is found intact (~1386 bp) in statistically significant numbers in isolates with MIC ≥ 8 mg/L (Additional file 2: Table S2) but is truncated (< 750 bp) in others. Plasmid borne vga homologues including vgaC are frequently detected in staphylococci and have been shown to confer cross resistance to pleuromutilins, lincosamides and streptogramin A antibiotics. A chromosome based vgaA gene variant, encoding an ATP-binding cassette protein conferring resistance to streptogramin A and related antibiotics in S. aureus, has also been described [94].
An amino acid substitution (I102-L) was detected in the dihydrofolate reductase (DHFR) gene, dhfr (counterpart of folA in S. pneumoniae), in the majority of isolates with trimethoprim MICs of 0.12 mg/L or greater. Similar substitutions of isoleucine to leucine at position 100 in S. pneumoniae [87,95] and Streptococcus pyogenes [96] is known to cause resistance to TMP. Additionally, we also identified polymorphisms within the promoter region (0-30 bp upstream) of the dhfr gene, an A5G substitution and insertions, in isolates exhibiting MICs ≥ 1 mg/L.
Mutations such as 1-2 codon insertions within folP, another core metabolic gene that is documented to confer resistance to trimethoprim/sulfamethoxazole in S. pneumoniae, were also examined. While insertions and or deletions were absent, amino acid residue variations were observed at position 198 (A198G) either alone (9/ 678) or in combination with dhfr I102L (26/678 isolates) in isolates exhibiting a wide range of trimethoprim MICs; 0.03 mg/L-32 mg/mL. However, this data was not statistically significant in the Binomial sign tests and hence was excluded from further analysis.
All of these variants might be causal, compensatory or linked to high MIC because of population structure. We therefore examined our variants and showed that susceptible and resistance alleles were found repeatedly in different BAP clusters, which were significant using binomial sign tests when defining a cut-off (Additional file 2: Table S2).
The pbp, mraY and vgaF genes were unable to be included into the custom database in ARIBA because of the sequence divergence. Therefore, we manually aligned the genes in all 678 isolates using MUSCLE [89] in SEAVIEW [90].
Similarly, analysis for known variants in gyrA and parC were also performed by manually aligning the genes. The tet(O/W/32/O) gene was scored by using blastn because ARIBA was unable to differentiate mosaic sequence patterns Serotype inference from whole genomes Serotypes were determined in silico using the Athey et al. [97] serotype database, which we implemented in ARIBA. Failed ARIBA runs and sequence non-matches were designated as not available (NA).

Statistical analyses
All general linear models were fit in R v. 3.3 [98] using the built-in function lm for models with solely fixed effects, or via Reduced Maximum Likelihood, using the function lme from the package nlme v. 3.1-141 [99], when BAPs cluster was included as a random effect. The response MIC values were log transformed. Our data for host and country is confounded because all of our human samples came from Vietnam (human S. suis has a higher prevalence in South East Asia [37]). Therefore, we coded 'Country' as separate populations-Canada, UK, Vietnam-pig and Vietnam-human. We classified 'serotypes' into either disease-associated or non-disease associated according to Wileman et al. [23].
Additional file 2: Figures S1-S13 and Table S2.  Fig S3 -The effects of candidate determinants on MIC for beta-lactams Fig S4 -The effects of candidate determinants on MIC for MLSB. Fig S5 -The effects of candidate alleles on MIC for tetracyclines. Fig S6 -The effects of candidate alleles on MIC for fluoroquinolone. Fig S7 -The effects of candidate alleles on MIC for the aminoglycoside, spectinomycin. Fig S8 -The effects of candidate alleles on MIC for the pleuromutilin, tiamulin. Fig S9 -The effects of candidate alleles on MIC for trimethoprim (TMP). Fig S10 -Variation in the presence of candidate AMR determinants explains consistent differences between genetic clusters. Fig S11 -Methods of using candidate determinants to predict MIC. Fig S12 -Allelic variation in ermB and unidentified sources of epistasis. Fig S13 -High levels of 'nestedness' suggest that resistance determinants to beta-lactams are acquired in a particular order.