DNA sequencing determines the exact order of nucleotide bases in a strand of DNA. This capability is foundational to modern genetics: it's how researchers identify mutations linked to disease, trace evolutionary relationships between organisms, and develop targeted therapies. The techniques covered here range from the classical Sanger method to modern automated approaches.

DNA Sequencing with Restriction Enzymes

The first step in classical sequencing is breaking the genome into workable pieces. Restriction enzymes (such as EcoRI and BamHI) recognize specific short nucleotide sequences and cleave the DNA at those sites, producing smaller fragments that can each be sequenced individually.

Once you have manageable fragments, the actual sequence is read using the Sanger dideoxy method (also called the chain termination method). Here's how it works:

Denature the double-stranded DNA fragment into single strands.
Mix each single-stranded template with a short primer, DNA polymerase, and normal deoxynucleotides (dNTPs: dATP, dTTP, dCTP, dGTP).
Add a small amount of dideoxynucleotides (ddNTPs) to the reaction. ddNTPs lack the 3'-hydroxyl group that's needed to form the next phosphodiester bond, so whenever one gets incorporated, the growing chain terminates at that position.
Set up four separate reactions, each spiked with a different ddNTP (ddATP, ddTTP, ddCTP, or ddGTP). In each tube, chains randomly terminate wherever that particular base occurs, generating a collection of fragments of every possible length ending in that base.
Separate the fragments by size using gel electrophoresis. The shortest fragments migrate fastest and appear at the bottom of the gel; the longest stay near the top.
Read the sequence from bottom to top across all four lanes. Each band tells you which base occupies that position.

The entire method depends on complementary base pairing: the primer binds the template at a known location, and DNA polymerase adds bases according to the template strand, ensuring the sequence read is accurate.

DNA sequencing with restriction enzymes, 14.2B: DNA Sequencing Techniques - Biology LibreTexts

Automated DNA Sequencing Process

Modern sequencing still builds on the Sanger method but replaces manual gel reading with fluorescence detection and capillary electrophoresis.

Key differences from the manual method:

Each of the four ddNTPs is tagged with a different fluorescent dye (commonly FAM, VIC, NED, ROX), so all four termination reactions can run in a single tube instead of four separate ones.
Fragments are separated by capillary electrophoresis rather than slab gel electrophoresis. The mixture is injected into a thin capillary filled with a polymer matrix, and an electric field drives the fragments through it. Shorter fragments migrate faster, just as on a gel.

How the sequence is read:

As fragments exit the capillary, a laser excites the fluorescent dye on each fragment.
A detector records the wavelength of emitted light, identifying which ddNTP (and therefore which base) terminated that fragment.
The instrument plots the results as a series of colored peaks (a chromatogram), and software converts the peak order directly into the DNA sequence.

Automated sequencing is both faster and more accurate than manual methods. A single run can read up to ~1,000 bases with approximately 99.9% accuracy, and instruments using 96-well plates can process many samples in parallel (high-throughput sequencing).

DNA sequencing with restriction enzymes, Visualizing and Characterizing DNA, RNA, and Protein | Microbiology

Advanced DNA Sequencing Techniques

Polymerase chain reaction (PCR) amplifies a specific DNA region before sequencing, ensuring there's enough template material to work with.
Next-generation sequencing (NGS) takes parallelism much further, sequencing millions of short DNA fragments simultaneously. This makes it possible to sequence an entire human genome in days rather than years.
Bioinformatics tools are essential for handling the massive datasets NGS produces, aligning reads, identifying variants, and annotating genes.
Genome assembly algorithms stitch overlapping short reads back together to reconstruct complete genomic sequences.

Human Genes vs. Human Proteins

The human genome contains roughly 20,000–25,000 protein-coding genes, yet the human proteome (the full set of proteins) includes well over 1 million distinct proteins. Several mechanisms explain how a relatively small number of genes generates such enormous protein diversity:

Alternative splicing — A single gene's pre-mRNA can be spliced in different ways, including or excluding particular exons to produce multiple mRNA variants. Each variant encodes a different protein isoform. The CD44 gene, for example, produces dozens of isoforms involved in different cell-signaling roles.
Post-translational modifications (PTMs) — After a protein is translated, chemical groups can be added or removed. Phosphorylation, glycosylation, and acetylation each change a protein's function, stability, or cellular location. Histones, for instance, are heavily modified by acetylation and methylation to regulate gene expression.
Proteolytic cleavage — Some proteins are synthesized as larger, inactive precursors (zymogens or proproteins) that must be cleaved to become active. Insulin is a classic example: proinsulin is cleaved to yield the active two-chain hormone plus a C-peptide fragment.
Protein complex formation — Individual protein subunits can assemble into multi-subunit complexes with properties distinct from any single component. Hemoglobin, for instance, functions as a tetramer of two $\alpha$ and two $\beta$ subunits, and its cooperative oxygen-binding behavior only emerges in the assembled complex.