Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Next-generation sequencing technologies form the backbone of modern computational genomics—and you'll be tested on more than just platform names. Understanding how each technology generates data determines everything downstream: what computational pipelines you'll use, what biases you'll need to correct for, and which biological questions each platform can actually answer. The differences between short-read and long-read technologies, optical versus non-optical detection, and sequencing-by-synthesis versus ligation-based approaches aren't just technical trivia—they're the foundation for choosing appropriate analysis methods.
When you encounter questions about read alignment, assembly algorithms, or error correction strategies, you're really being asked: do you understand why different sequencing chemistries produce different data characteristics? A platform's read length, error profile, and throughput directly shape the computational challenges you'll face. Don't just memorize that Illumina produces short reads and PacBio produces long reads—know why that matters for genome assembly, structural variant detection, and transcriptome analysis.
Short-read technologies revolutionized genomics by enabling massively parallel sequencing at low cost. The trade-off is read length for throughput—these platforms sacrifice the ability to span repetitive regions in exchange for generating billions of reads per run, making them ideal for applications where coverage depth matters more than contiguity.
Compare: Illumina vs. Ion Torrent—both produce short reads suitable for variant calling, but Illumina's optical detection provides more uniform quality scores while Ion Torrent's semiconductor approach offers faster, cheaper runs for targeted applications. If asked about platform selection for a clinical panel, consider throughput needs versus cost constraints.
Long-read technologies solve the fundamental limitation of short reads: the inability to span repetitive elements and resolve structural complexity. These platforms sacrifice per-base accuracy (in raw reads) and throughput for the ability to generate reads that can bridge gaps, phase haplotypes, and capture full-length transcripts.
Compare: PacBio HiFi vs. Oxford Nanopore—both provide long reads for assembly and structural variant detection, but PacBio HiFi achieves higher per-read accuracy through consensus while Nanopore offers longer maximum read lengths and real-time base calling. For phased diploid assemblies, PacBio HiFi is often preferred; for spanning the largest repeats, Nanopore's ultra-long reads may be necessary.
Understanding deprecated platforms helps you interpret legacy datasets and appreciate why current technologies evolved as they did. These methods introduced key innovations that shaped modern sequencing chemistry.
Compare: 454 vs. modern long-read platforms—454 pioneered longer NGS reads for assembly applications, but PacBio and Nanopore now provide 10–1000× longer reads with competitive accuracy. Legacy 454 datasets may still appear in older metagenomic studies.
These aren't sequencing platforms themselves but library construction methods that enhance what any short-read platform can achieve. They transform the information content of sequencing data by providing spatial context beyond individual reads.
Compare: Paired-end vs. mate-pair libraries—paired-end provides local context (hundreds of bp) for alignment and small variant detection, while mate-pair provides long-range linking (kb scale) for scaffolding and large structural variant detection. Modern long-read data increasingly replaces mate-pair for scaffolding applications.
These methods combine NGS with biochemical enrichment or selection to answer specific biological questions. The computational analysis differs fundamentally from whole-genome approaches because you're interpreting enrichment signals, not uniform coverage.
Compare: RNA-Seq vs. ChIP-Seq analysis—both use NGS reads but ask fundamentally different questions. RNA-Seq quantifies transcript abundance (counting reads per gene), while ChIP-Seq identifies genomic locations (calling enrichment peaks). Confusing these analysis paradigms is a common error.
| Concept | Best Examples |
|---|---|
| Short-read, high-throughput | Illumina, Ion Torrent, SOLiD |
| Long-read, single-molecule | PacBio SMRT, Oxford Nanopore |
| Optical detection methods | Illumina (fluorescence), 454 (pyrophosphate luminescence) |
| Non-optical detection | Ion Torrent (pH), Oxford Nanopore (ionic current) |
| Synthesis-based chemistry | Illumina, Ion Torrent, PacBio |
| Ligation-based chemistry | SOLiD |
| Library strategies for structural context | Paired-end, mate-pair |
| Enrichment-based applications | ChIP-Seq, RNA-Seq, targeted panels |
Which two platforms use non-optical detection methods, and how do their error profiles differ as a result?
You need to assemble a plant genome with large repetitive regions and characterize structural variants. Compare the advantages of PacBio HiFi versus Oxford Nanopore for this application—which would you choose and why?
Explain why paired-end sequencing improves alignment specificity compared to single-end reads, and describe a scenario where mate-pair libraries would be necessary instead.
A collaborator hands you a dataset and says "analyze this for differential expression." What library preparation details do you need to know before choosing a computational pipeline, and why do these choices matter?
If an FRQ asks you to design a study detecting transcription factor binding sites genome-wide, which sequencing application would you use, and what computational steps distinguish its analysis from standard WGS variant calling?