Skip to main content
Fig. 1 | BMC Biology

Fig. 1

From: Insertion variants missing in the human reference genome are widespread among human populations

Fig. 1

Overview of InserTag. a In the discovery step, discordant paired-end reads of the sample genome are clustered according to the location of anchored reads on the reference genome, and local de novo assembly is performed for each strand. The assembled contigs, called insertion-tags, consist of flanking reference sequences (gray bars), breakpoints, and partial inserted sequences (red and blue bars). If two insertion-tags are close and placed on opposite strands facing each other, then a putative insertion event is suggested to have occurred between the paired insertion-tags. b In the tracing step, each segment of paired insertion-tags is aligned to the target genomes, including non-human primate genomes, other human assemblies, and databases of human unmapped contigs, to trace the full insertion sequences. c Using the location of the insertion in the reference genome and the traced inserted sequences, both reference and non-reference insertion alleles are generated. The raw sequencing reads of the sample genome are aligned to these alleles, and the best-supported alleles are selected based on the alignment score. Using the read-depth ratio of supporting reads of each allele, the biallelic genotypes of non-reference insertion SVs are determined

Back to article page