
DNA Fragmentation in NGS: What Do I Need to Understand and How Does It Affect My Sequencing Project?
Nucleic acid purification methodologies are the foundational pillar of molecular analysis techniques in biomedicine. The starting samples are highly diverse, ranging from cell isolates to formalin-fixed or paraffin-embedded tissues, and include techniques from endpoint PCR to high-throughput sequencing. Therefore, the impact of different genetic material fragmentation techniques can increase accuracy, enable the development of error-correction algorithms, optimize PCR products for amplicon sequencing, increase sequence specificity, and improve yield and integrity in fragmentation. Overall, they aim to enhance data quality both in routine techniques and in advanced methodologies such as fusion gene detection, SNVs, and SNPs.
In our case, we will focus on how DNA fragmentation methods affect NGS, where library construction is a key step for data analysis and interpretation.
So, how do DNA fragmentation methods affect variant identification?
Different DNA fragmentation techniques can introduce unique sequencing errors that could affect the accuracy of variant identification.
Sonication-based fragmentation uses ultrasonic waves to shear DNA and is a widely used method due to its ability to produce uniformly sized fragments. However, sonication can induce oxidative damage to DNA, especially lesions that lead to sequencing artifacts in the form of substitutions. An example is the oxidative lesion 8-oxo-G, which can result in substitutions such as C:G>A:T and C:G>G:C—common sequencing errors in libraries generated via sonication.
Beyond oxidative damage, sonication can also create chimeric artifact reads that contain inverted repeat sequences (IVS). These arise when single-stranded DNA fragments containing IVS from the same molecule anneal during end repair, resulting in the inclusion of complementary strand sequences.
On the other hand, enzymatic fragmentation relies on endonucleases to digest DNA. It’s a popular alternative due to its ease of use, scalability, and minimal DNA loss. However, enzymatic fragmentation often introduces more SNVs and deletions than sonication, especially in palindromic sequences.
It’s important to note that the specific types of artifacts generated will vary depending on the exact sonication or enzymatic protocol used.
These artifacts can lead to variant calls that complicate downstream analysis and result in misinterpretations.
What strategies mitigate the generation of these errors?
The PDSM model (partial single strands derived from a similar molecule) suggests that double-stranded DNA is randomly sheared by sonication, creating partial single-stranded molecules. These can anneal to complementary sequences in other single-stranded fragments from the same molecule, leading to chimeric reads and the introduction of mutations during end repair and PCR amplification.
The PDSM model also explains errors in enzymatic fragmentation. Endonucleases introduce nicks, creating single-stranded regions. If these contain palindromic sequences, they can anneal to complementary sequences within the same molecule. Subsequent end repair and A-tailing formation can introduce mutations and chimeric reads. Unlike sonication, in enzymatic fragmentation, most of these mutations occur before PCR amplification.
How do DNA fragmentation methods compare?
We’ll focus on fragmentation for library preparation in Hybrid-Capture Sequencing, a technique widely used for exome sequencing, genotyping, novel gene discovery, InDel detection, and rare SNP identification.
Comparative studies have shown significantly higher amounts of artifacts such as SNVs, insertions, and deletions in enzymatically fragmented libraries. This suggests that while enzymatic fragmentation offers convenience and scalability, it introduces more sequencing errors that directly impact the precision of variant identification.
As previously mentioned, the main artifacts are chimeric reads with inverted repeat sequences (IVS)—cis and trans—in sonication, and chimeric artifact reads with palindromic sequences and mismatched bases in enzymatic fragmentation.
Another factor is the timing of mutation introduction. Mutations from sonication tend to arise during PCR, while those from enzymatic fragmentation appear earlier—during end repair and A-tailing—suggesting each method influences the stage at which errors occur.
In conclusion, what should I consider when choosing one method over another?
Bioinformatic tools help mitigate these errors through algorithms that identify and filter possible artifact variants based on the presence of IVS and PS in the reference genome. These tools significantly reduce false positives in both sonication and enzymatic fragmentation datasets, thereby ensuring accurate variant identification in NGS workflows.
In addition to bioinformatic filtering, optimizing fragmentation parameters can reduce errors. For example, fine-tuning sonication settings may minimize oxidative damage to DNA. Similarly, using enzymes with minimal sequence bias can reduce error rates in enzymatic fragmentation.
Thus, choosing the right fragmentation method requires careful consideration of the application and potential error rates in a sequencing project. Understanding the characteristics and mechanisms of these errors is critical for accurate data interpretation and informed decision-making in sequencing analysis.