Skip to main content

Table 2 Challenges of studying human gut virome and possible solutions

From: Studying the gut virome in the metagenomic era: challenges and perspectives

Steps Challenges Possible solutions
Nucleic acid extraction • Existence of active and silent fractions of viromes
• Total nucleic acid isolation protocols (TNAI):
+ Allow characterization of microbiome along with virome potential = holistic picture of all components of the microbiome
+ High-throughput
Lead to inflation of false-positive hits from bacteria in the subsequent data analysis
• Viral-like particle (VLP) isolation protocols:
+ Ensure true positives on viruses due to physical removal of bacteria by filtration
Give a low-concentration output [79] that may complicate the genomic library preparation step
– Usually require multiple time-consuming steps of VLP and nucleic acid precipitation [78, 80]
• Combination of TNAI and VLP isolation protocol approaches [81]
Genomic library preparation • Limited amount of viral genetic material available • Use of more sensitive genomic library preparation kits
• MDA may lead to overrepresentation of circular ssDNA viruses [82] and underrepresentation of viruses with extreme GC content [83] • Restricted use of MDA
• Studying RNA viruses requires additional effort due to the relative instability of RNA genetic material:
- Use of reverse transcriptase to convert RNA to cDNA
- Restricted usage of RNase in protocols handling both DNA and RNA viruses [84]
- May require separate isolation protocol (arising from the previous point) and, therefore, increase of the starting material
• Metatranscriptomics approaches
• Use of reverse transcription step
• Studying ssDNA viruses requires additional effort:
- Some of the WGA techniques that precede the genomic library preparation procedure might introduce biases into the representation of ssDNA viruses [77, 82, 85]
- The majority of current genomic library preparation procedures cannot handle ssDNA genomes due to the use of dsDNA adapters
- ssDNA viruses have been shown to have higher mutation rates than dsDNA viruses [86], thus increasing the microdiversity of the metagenome, which limits reference-based approach
• Use of ssDNA adaptors in adaptor-ligation reaction at the genomic library preparation step [77]
• Selection of an appropriate cut-off for coverage is complicated • Studies report discoveries of a huge number of viruses at a depth of 1–15 × 106 reads per sample [60, 78,79,80]
Quality control • Removal of bacterial sequences is complicated by the viral signals from prophages (both cryptic and inducible) carried by bacterial genomes • Use of tools for identification of prophages in bacterial genomes [87,88,89], though some are limited to known prophages. The combination of multiple methods has been shown to enrich the set of detected prophages [90] and therefore prevent their concurrent removal with bacterial sequences.
Data analysis • Existing databases do not fully represent viral diversity [91] • Use of de novo assembly approaches
• Rapid evolution and diversity of viral genomes limits reference-based approaches • Use of reference databases that include both cultured viruses and computationally identified viral contigs [25, 92]
• Use of a protein-based search
• Use of a profile hidden Markov model based on protein domains allows the identification of remote homologs [93]
• De novo assembly approach is sensitive to biases introduced during genomic library preparation and sequencing:
- Low DNA input for genomic library preparation decreases the percentage of reads that map back to the corresponding assemblies [94, 95]
- Use of a DNA amplification step might affect the distribution of read coverage [94, 96]
- Shifts in GC content during genomic library preparation [97] affect the completeness of genomes and cause assembly fragmentation
• Adjustment of the assembly pipeline according to applied genomic library preparation procedure [96]: use of modes suitable for an uneven distribution of read coverage such as single-cell SPAdes [98, 99] preceded by read de-duplication [96] or Velvet-SC [100]
• Use of genomic library preparation protocols without any amplification procedure (needs high DNA input, probably not applicable for viromics) [101, 102]
• Reproducibility of assembly results when combining different assemblers is complicated by technical challenges [103, 104] and the possibility of the appearance of chimera assemblies [104]