Why This Matters
DNA sequencing is the foundation of modern genetics. Every major advance from disease diagnosis to evolutionary analysis depends on our ability to read genetic code. You're being tested not just on knowing these techniques exist, but on understanding why each method was developed, what trade-offs it involves, and when you'd choose one over another. The core concepts here include chain termination chemistry, sequencing by synthesis, read length versus throughput trade-offs, and computational assembly strategies.
Don't just memorize technique names and dates. Know what problem each method solves: Why do we need long reads for some projects but short reads work fine for others? Why did scientists move from chemical cleavage to enzymatic methods? When you can explain the underlying mechanisms and compare approaches, you'll handle both multiple-choice questions and FRQs that ask you to design an experiment or troubleshoot a sequencing strategy.
Chain Termination and Chemical Methods
These foundational approaches established the principles of DNA sequencing by generating fragments of different lengths and separating them to read the sequence. The key insight: if you can control where synthesis stops, you can determine sequence position.
Sanger Sequencing (Chain Termination Method)
- Dideoxynucleotides (ddNTPs) lack a 3'-OH group. Normal dNTPs have a hydroxyl at both the 3' and 5' carbons, which allows the next nucleotide to be added. ddNTPs are missing that 3'-OH, so once one gets incorporated, the chain can't extend any further. Because ddNTPs are mixed in at low concentration with normal dNTPs, termination happens randomly at every position of a given base, producing a full set of fragments of different lengths.
- Capillary electrophoresis separates fragments by size. Each of the four ddNTPs (ddATP, ddTTP, ddGTP, ddCTP) carries a different fluorescent label. As fragments migrate through the capillary, a laser excites the dyes and a detector reads the color at each position. This is how the sequence gets called automatically.
- Gold standard for accuracy (up to ~1000 bp read length). Still widely used to validate NGS results, confirm cloned inserts, and sequence individual genes or plasmids.
Maxam-Gilbert Sequencing
- Chemical cleavage method. Instead of synthesizing new strands, this approach starts with intact DNA that's been end-labeled, then uses specific chemical reagents to break the backbone preferentially at G, A, T, or C residues. Each reaction produces a ladder of fragments ending at positions where that base occurred.
- Requires radioactive labeling. Fragments are visualized by autoradiography after gel electrophoresis, making the method labor-intensive and hazardous compared to Sanger's fluorescent approach.
- Historically significant but largely obsolete. Replaced by Sanger sequencing due to complexity, safety concerns, and difficulty automating the chemical reactions.
Compare: Sanger vs. Maxam-Gilbert: both produce fragment ladders read by electrophoresis, but Sanger uses enzymatic termination (ddNTPs halt DNA polymerase) while Maxam-Gilbert uses chemical cleavage (reagents break existing DNA at specific bases). Sanger's enzymatic approach won out because it's safer, more scalable, and far easier to automate.
Short-Read Next-Generation Sequencing
NGS platforms achieve massively parallel sequencing, with millions of fragments sequenced simultaneously. The trade-off: shorter individual reads but enormous throughput and dramatically lower cost per base.
Illumina Sequencing
- Sequencing by synthesis with reversible dye terminators. Template fragments are attached to a flow cell and amplified into clusters. During each cycle, a single fluorescently labeled nucleotide is added to every growing strand. A camera captures which base was incorporated at each cluster, and then the fluorescent tag and the 3' blocking group are chemically removed so the next cycle can proceed. This one-base-at-a-time approach is what makes it so accurate.
- Short reads (50โ300 bp) but extremely high throughput. A single run can produce billions of reads, making Illumina ideal for whole-genome sequencing, RNA-seq, ChIP-seq, and large population studies.
- Dominant platform in genomics research. Cost-effective and highly accurate (error rates below 1%), though short reads complicate assembly of repetitive regions longer than the read length.
Ion Torrent Sequencing
- Semiconductor detection of pH changes. When DNA polymerase incorporates a nucleotide, it releases a hydrogen ion (H+). The resulting tiny pH drop is detected by a semiconductor chip sitting beneath each sequencing well. No cameras, lasers, or fluorescent dyes are needed.
- Faster run times and lower instrument cost. Produces reads up to ~400 bp, making it suitable for targeted gene panels, amplicon sequencing, and smaller-scale projects.
- Struggles with homopolymer regions. When multiple identical bases occur in a row (e.g., AAAA), all four nucleotides get incorporated in a single flood step. The chip measures a larger pH signal, but distinguishing whether 4 or 5 A's were added is unreliable. This limits accuracy in homopolymer-rich sequences.
Pyrosequencing
- Detects pyrophosphate (PPi) release during synthesis. Each time a nucleotide is incorporated, PPi is released. An enzyme cascade converts PPi into ATP, which drives a luciferase reaction that produces a flash of light. The light intensity is proportional to the number of nucleotides incorporated.
- Real-time detection enables quantitative applications. Because the light signal scales with incorporation events, pyrosequencing is useful for SNP genotyping and methylation analysis where you need to measure allele frequencies at a specific position.
- Limited read length (~300 bp). Largely superseded by Illumina for most large-scale applications, but still valuable for targeted quantitative assays. Like Ion Torrent, it also has difficulty accurately counting bases in homopolymer stretches.
Compare: Illumina vs. Ion Torrent: both are short-read NGS platforms, but Illumina uses optical detection (fluorescent reversible terminators) while Ion Torrent uses electronic pH sensing. Ion Torrent is faster and cheaper to set up, but Illumina dominates large-scale projects due to higher accuracy and throughput.
Long-Read Sequencing Technologies
Long reads solve the assembly problem: repetitive regions, structural variants, and complex genomes require reads that span entire repeat units. The mechanism shift: observe single molecules directly rather than amplified clusters.
Single-Molecule Real-Time (SMRT) Sequencing
- Observes a single DNA polymerase incorporating nucleotides in real time. Each of the four nucleotide types carries a distinct fluorescent label on its phosphate group. The polymerase sits at the bottom of a tiny well called a zero-mode waveguide (ZMW), which is so small that only the fluorescence from the nucleotide actively being incorporated is detected. No PCR amplification is needed, so there's no amplification bias.
- Ultra-long reads (10,000โ30,000+ bp). This length enables phasing of haplotypes (determining which alleles sit on the same chromosome) and resolution of complex structural variants like large insertions, deletions, and inversions.
- Higher per-read error rate, but correctable. The polymerase reads a circular template multiple times, and these passes are computationally combined into a circular consensus sequence (CCS) with accuracy rivaling Illumina. This makes SMRT sequencing ideal for de novo genome assembly.
Nanopore Sequencing
- Measures ionic current changes as DNA passes through a protein pore. A voltage drives ions through a nanoscale pore embedded in a membrane. As a single strand of DNA translocates through the pore, each base (or short sequence of bases) partially blocks the current in a characteristic way. A computational algorithm translates these current disruptions into base calls.
- Reads can exceed 1 million bp. There's no theoretical upper limit on read length since you're threading an intact molecule through the pore. This makes nanopore uniquely suited for spanning entire repetitive regions or even reading full chromosomes in a single pass.
- Portable and requires minimal sample prep. The MinION device is roughly the size of a USB drive and enables field-based sequencing for outbreak surveillance, environmental metagenomics, and point-of-care diagnostics.
Compare: SMRT vs. Nanopore: both produce long reads without amplification, but SMRT uses fluorescence detection of polymerase activity while Nanopore uses electrical detection of DNA translocation. Nanopore offers longer reads and portability; SMRT typically achieves higher accuracy with circular consensus. Choose based on whether you need field deployment or maximum accuracy.
Sample Preparation and Assembly Strategies
These approaches aren't sequencing chemistries themselves, but they're essential for generating and interpreting sequence data. Understanding when to use each is critical for experimental design questions.
Polymerase Chain Reaction (PCR)
PCR amplifies a specific DNA target exponentially through repeated thermal cycling. Each cycle has three steps:
- Denaturation (~95ยฐC): Heat separates the double-stranded template into single strands.
- Primer annealing (~55ยฐC): Short oligonucleotide primers bind to complementary sequences flanking the target region.
- Extension (~72ยฐC): DNA polymerase (typically Taq polymerase) synthesizes new strands starting from each primer.
After 25โ35 cycles, you go from a tiny amount of template to millions of copies. PCR is an essential upstream step for most sequencing workflows because it generates enough template from minute samples. It's also used for targeted enrichment of specific genomic regions before sequencing.
The downside: amplification bias. GC-rich or repetitive regions may amplify unevenly, and PCR erases base modifications like methylation. This is exactly why amplification-free methods (SMRT, Nanopore) are valuable when you need unbiased representation or want to detect epigenetic marks.
Shotgun Sequencing
- Random fragmentation followed by sequencing and computational assembly. The genome is broken into many overlapping pieces (mechanically or enzymatically), each piece is sequenced, and then software aligns the overlapping ends to reconstruct the original sequence.
- Enabled the Human Genome Project. Originally paired with Sanger sequencing, shotgun approaches are now combined with NGS for efficient whole-genome assembly.
- Assembly quality depends on coverage and read length. Coverage refers to how many times, on average, each base in the genome is sequenced. Short reads require higher coverage (30x or more) to ensure overlaps; long reads simplify assembly of repetitive regions because a single read can span an entire repeat.
Compare: PCR-based library prep vs. amplification-free methods: PCR increases template quantity but introduces bias and destroys base modifications. SMRT and Nanopore can sequence native DNA directly, preserving modifications like methylation. If an FRQ asks about detecting epigenetic marks without bisulfite conversion, amplification-free long-read sequencing is your answer.
Quick Reference Table
|
| Chain termination chemistry | Sanger sequencing |
| Chemical cleavage method | Maxam-Gilbert sequencing |
| Short-read NGS (optical detection) | Illumina sequencing, Pyrosequencing |
| Short-read NGS (electronic detection) | Ion Torrent sequencing |
| Long-read single-molecule sequencing | SMRT sequencing, Nanopore sequencing |
| Sample amplification | PCR |
| Genome assembly strategy | Shotgun sequencing |
| Real-time/quantitative sequencing | Pyrosequencing, SMRT, Nanopore |
Self-Check Questions
-
Which two sequencing methods produce long reads without requiring PCR amplification, and what detection mechanism does each use?
-
Compare Sanger sequencing and Illumina sequencing: What do they share in terms of basic chemistry, and how do they differ in throughput and read length?
-
A researcher needs to sequence a bacterial genome with many repetitive transposon insertions. Which sequencing platform would you recommend and why?
-
What is the key limitation shared by Ion Torrent and Pyrosequencing when sequencing homopolymer regions, and what causes this problem?
-
If you needed to detect DNA methylation patterns without bisulfite conversion, which sequencing approach would preserve this epigenetic information, and what feature of the technology makes this possible?