Central Dogma of Molecular Biology
Overview of the Central Dogma
The central dogma, first articulated by Francis Crick in 1958, states that genetic information flows from DNA → RNA → protein. Three processes carry out this flow:
- Replication: DNA copies itself so cells can divide with a full genome.
- Transcription: A segment of DNA is copied into a complementary RNA molecule.
- Translation: The RNA sequence is read by ribosomes to build a protein.
The flow is largely unidirectional. Information moves from nucleic acids to protein, not the other way around. However, important exceptions exist. Reverse transcriptase in retroviruses like HIV copies RNA back into DNA, a discovery that surprised researchers in 1970 and earned Howard Temin and David Baltimore a Nobel Prize in 1975. Some viruses (like Hepatitis B) also blur the standard pathway. These exceptions didn't overthrow the central dogma so much as refine it.
It's worth distinguishing what Crick actually claimed. He was not saying that DNA always goes to RNA and then to protein in a rigid pipeline. His central dogma was a more specific negative claim: once information has passed into protein, it cannot flow back out to nucleic acid. RNA-to-DNA transfer (reverse transcription) and RNA-to-RNA replication (in some RNA viruses) don't violate this principle, because information is still moving between nucleic acids.
Importance of the Central Dogma
The central dogma provides the framework for understanding how genetic information is stored, copied, and expressed. It explains the relationship between genes, RNA, and proteins, which together determine cellular function and the development of complex traits.
This framework has been essential for advances in molecular biology, genetics, and medicine. Personalized medicine and gene therapy both depend on understanding exactly where in the DNA → RNA → protein pipeline something goes wrong. Disruptions at any step can cause disease: a single nucleotide change during replication can produce sickle cell anemia, while errors in gene regulation can contribute to cancer.
DNA Replication and Genetic Integrity

Mechanism of DNA Replication
DNA replication is how a cell duplicates its entire genome before dividing, ensuring each daughter cell gets an identical copy. The process involves a coordinated team of enzymes working together at structures called replication forks.
Here's how it works, step by step:
-
Replication begins at specific sites called origins of replication. Eukaryotic chromosomes have many origins so the large genome can be copied efficiently.
-
DNA helicase unwinds the double helix, separating the two strands and creating a replication bubble that expands bidirectionally.
-
Single-stranded DNA binding proteins (SSBs) stabilize the separated strands, preventing them from reannealing.
-
Topoisomerase relieves the torsional strain ahead of the replication fork that builds up as helicase unwinds the helix. Without it, the DNA ahead of the fork would become overwound and stall replication.
-
DNA primase synthesizes short RNA primers on each strand, giving DNA polymerase a starting point (DNA polymerase can't initiate a new strand on its own).
-
DNA polymerase III extends the primers, adding nucleotides in the 5' to 3' direction only. This creates an asymmetry:
- The leading strand is synthesized continuously toward the replication fork.
- The lagging strand is synthesized in short pieces called Okazaki fragments, moving away from the fork.
-
DNA polymerase I removes the RNA primers and replaces them with DNA nucleotides.
-
DNA ligase seals the gaps between Okazaki fragments, producing a continuous new strand.
The result is two identical double-stranded DNA molecules, each containing one original strand and one newly synthesized strand. This pattern is called semiconservative replication, confirmed experimentally by Matthew Meselson and Franklin Stahl in 1958.
Maintaining Genetic Integrity
Accurate replication is critical. Even small errors can cause mutations leading to genetic disorders or cancer. Cells have multiple layers of defense:
- Proofreading by DNA polymerase: As it adds nucleotides, DNA polymerase checks each one. If a mismatch is detected, the enzyme's 3' to 5' exonuclease activity removes the incorrect nucleotide before continuing. This reduces the error rate to roughly 1 in base pairs.
- Mismatch repair: After replication, dedicated enzymes scan the new DNA for errors that proofreading missed and correct them. Combined with proofreading, this brings the final error rate down to about 1 in to base pairs per replication.
- Base excision repair and nucleotide excision repair: These pathways fix DNA damage caused by chemicals, radiation, or spontaneous chemical changes to bases.
- Telomerase: Linear chromosomes lose a small amount of DNA from their ends (telomeres) with each replication cycle because DNA polymerase cannot fully replicate the very end of a linear molecule. Telomerase, an RNA-dependent DNA polymerase, rebuilds telomeres in certain cell types (like stem cells and germ cells) to prevent the gradual loss of genetic information.
- Cell cycle checkpoints: Before a cell commits to division, checkpoint mechanisms verify that replication is complete and accurate. If problems are detected, the cell cycle pauses for repair or, in severe cases, the cell is directed to self-destruct (apoptosis).
Transcription and RNA's Role in Gene Expression

Mechanism of Transcription
Transcription is the process of copying a gene's DNA sequence into a complementary RNA molecule. The enzyme RNA polymerase carries out this work, reading the DNA template and assembling an RNA strand.
Transcription proceeds in three stages:
- Initiation: RNA polymerase binds to a promoter sequence upstream of the gene. In prokaryotes, a sigma factor helps the polymerase recognize the promoter. In eukaryotes, a set of transcription factors and a mediator complex perform this role, assembling at the promoter before RNA polymerase II can bind. The enzyme then separates the DNA strands locally, forming a transcription bubble.
- Elongation: RNA polymerase moves along the template strand in the 3' to 5' direction, synthesizing the RNA strand in the 5' to 3' direction. It adds ribonucleotides that are complementary to the DNA template, with uracil (U) pairing where thymine (T) would appear in DNA.
- Termination: RNA polymerase reaches a termination signal in the DNA. The newly made RNA is released, and the polymerase detaches from the template.
In eukaryotes, the primary RNA transcript (pre-mRNA) undergoes post-transcriptional modifications before it can function as mature mRNA:
- A 5' cap (a modified guanine nucleotide, specifically 7-methylguanosine) is added, which protects the mRNA from degradation and helps ribosomes recognize it.
- A poly-A tail (a string of ~200 adenine nucleotides) is added to the 3' end, stabilizing the mRNA and aiding its export from the nucleus.
- Splicing removes non-coding sequences called introns, leaving only the coding sequences (exons) joined together. This is carried out by the spliceosome, a large complex of small nuclear RNAs and proteins. Alternative splicing can produce different proteins from the same gene by including or excluding particular exons. This discovery helped explain how roughly 20,000 human genes can produce well over 100,000 distinct proteins.
RNA's Roles in Gene Expression
RNA does far more than just carry messages from DNA to ribosomes. Several types of RNA play distinct roles:
- mRNA (messenger RNA) carries the coding sequence from the nucleus to the ribosome for translation.
- rRNA (ribosomal RNA) forms the structural and catalytic core of the ribosome itself. The peptidyl transferase activity that forms peptide bonds during translation is carried out by rRNA, making the ribosome fundamentally a ribozyme.
- tRNA (transfer RNA) acts as an adaptor, matching amino acids to their corresponding codons during translation.
- miRNA and siRNA are small regulatory RNAs that can silence gene expression by targeting specific mRNAs for degradation or blocking their translation. Their discovery opened up the field of RNA interference (RNAi), recognized with a Nobel Prize to Andrew Fire and Craig Mello in 2006.
- Ribozymes are RNA molecules with catalytic activity. The discovery that RNA can act as an enzyme (not just an information carrier) earned Sidney Altman and Thomas Cech the Nobel Prize in 1989 and supported the "RNA World" hypothesis, which proposes that early life relied on RNA for both information storage and catalysis before DNA and proteins took over those roles.
- Long non-coding RNAs (lncRNAs) participate in transcriptional regulation, chromatin remodeling, and nuclear organization, though many of their functions are still being worked out.
Translation and Protein Synthesis from mRNA
Mechanism of Translation
Translation is where the information in mRNA finally becomes a protein. This process takes place on ribosomes in the cytoplasm and requires mRNA, tRNAs, and various protein factors working in concert.
The mRNA sequence is read in groups of three nucleotides called codons. Each codon specifies a particular amino acid (or signals the ribosome to stop). There are 64 possible codons total: 61 code for amino acids and 3 are stop codons. tRNAs serve as adaptors: one end carries a specific amino acid, and the other end has an anticodon that base-pairs with the complementary codon on the mRNA.
Translation has three stages:
- Initiation: The small ribosomal subunit binds to the mRNA and locates the start codon (AUG). The initiator tRNA, carrying the amino acid methionine, pairs with this codon. The large ribosomal subunit then joins to form the complete ribosome, with the initiator tRNA sitting in the P site.
- Elongation: The ribosome moves along the mRNA one codon at a time. At each step, the appropriate tRNA enters the A site, a peptide bond forms between the growing chain and the new amino acid (catalyzed by rRNA in the large subunit), and the spent tRNA exits through the E site. This cycle repeats, building the polypeptide chain from its amino (N) terminus to its carboxyl (C) terminus.
- Termination: When the ribosome reaches a stop codon (UAA, UAG, or UGA), no tRNA binds. Instead, release factors enter the A site, triggering hydrolysis of the bond between the polypeptide and the final tRNA. The completed polypeptide is released and the ribosomal subunits dissociate.
Post-translational Modifications and the Genetic Code
A freshly made polypeptide isn't necessarily a finished protein. Post-translational modifications shape it into its functional form:
- Folding: The polypeptide folds into a specific three-dimensional structure, often assisted by chaperone proteins. Misfolded proteins can be tagged for destruction by the proteasome or, in some cases, cause disease (prion diseases like mad cow disease involve misfolded proteins).
- Cleavage: Portions of the chain may be cut away to activate the protein. Insulin, for example, is produced as a longer precursor called proinsulin that must be cleaved to yield the active hormone.
- Chemical modifications: Functional groups can be added. Common examples include phosphorylation (adding a phosphate group, often to regulate protein activity on/off), glycosylation (adding sugar chains, important for cell surface proteins), and disulfide bond formation (which stabilizes protein structure in extracellular environments).
The genetic code itself has several notable properties:
- It is nearly universal across all life. From bacteria to humans, the same codons specify the same amino acids, with only minor variations in some mitochondria and a few microorganisms. This universality is strong evidence that all life shares a common ancestor.
- It is degenerate (or redundant), meaning multiple codons can code for the same amino acid. For instance, leucine is specified by six different codons. This redundancy provides some buffer against mutations: a change in the third position of a codon often still produces the same amino acid, a pattern known as wobble.
- The universality of the code is what makes recombinant DNA technology possible. A human gene inserted into a bacterium will be read using the same code, allowing the bacterium to produce a human protein like insulin. This principle underlies the entire biotechnology industry.