Skip to main content

Table 2 Filtering and QC procedures in Stage 2: calling genotypes in all 725 monkeys at the unequivocal segregating sites identified in Stage 1. Stage 2 started with 4,235,761 sites and ended with 3,369,989 sites

From: Sequencing strategies and characterization of 721 vervet monkey genomes for future genetic analyses of medically relevant traits

QC filtering procedure Number of variants removed
Not passing SAMtools filters (“mpileup -S -D -q 30 -Q 20”, “vcfutils.pl varFilter -w 10 -d 3 -D 12740 -e 0–2 0”) 209,826
Cumulative coverage outside of twofold range of global median coverage 20,843
MAF in 723 monkeys <10 % 10,766
Missing >50 % of data 105
Too few (<3) loci in 3Mb regions, not enough for TrioCaller to work. 1,360
Loci unmapped or not mapped uniquely during LiftOver 32,419
Filtered out by GATK’s FilterLiftedVariants 4,094
Whole contig removed for contigs with >1 chromosome switching events per 100 loci 6,208
LiftOver MapScore <0.5 61,721
Loci mapped to the same coordinate in the new reference genome 4
Alignment: identified regions of poor alignment (mapping quality <2- or coverage >2-fold range of global median depth) and masked these genotypes as missing. Sites with >50 % missing in 4X and above monkeys are removed 438,423
Sex chromosome SNPs 65,271
>=5 Mendel errors in parent–child comparisons 8,563
>60 % heterozygous calls 6,201
Total 865,772