Plant genomes span an enormous range of sizes. The bladderwort Genlisea tuberosa has one of the smallest known plant genomes at about 61 Mb, while Paris japonica holds the record at roughly 150 Gb. That's a ~2,500-fold difference.

Here's the surprising part: genome size doesn't scale with how complex the organism is, and it doesn't predict how many genes the organism has. This mismatch is called the C-value paradox. A simple fern can have a genome many times larger than a flowering tree's.

Variation across species

Even closely related species can differ dramatically in genome size. Arabidopsis thaliana, the go-to model organism in plant biology, has a compact 135 Mb genome. But Brassica rapa, in the same family (Brassicaceae), clocks in at 529 Mb. Gymnosperms tend to run especially large: the loblolly pine (Pinus taeda) has a genome around 22 Gb.

Factors influencing size

Three main forces drive genome size changes:

Polyploidy duplicates the entire genome at once, causing a sudden jump in size. Many crop species (wheat, cotton) are polyploid.
Transposable element accumulation gradually inflates the genome as mobile DNA sequences copy themselves and reinsert. This is the biggest contributor to size differences in most plant lineages.
DNA deletion and genome downsizing work in the opposite direction. Some lineages lose DNA faster than they gain it, keeping their genomes compact over evolutionary time.

Nuclear genome organization

The nuclear genome is packaged into chromosomes, which are tightly condensed structures made of DNA wound around proteins. Chromosome number varies widely: Haplopappus gracilis has just 2n = 4, while the adder's-tongue fern Ophioglossum reticulatum has 2n = 1,440.

Chromosomal structure

Each chromosome contains a single linear DNA molecule. That molecule wraps around clusters of histone proteins to form nucleosomes, the basic unit of chromatin. Nucleosomes then coil and fold into higher-order structures, compacting meters of DNA into a nucleus just micrometers across. Chromosomes become visible as distinct, condensed bodies during cell division (mitosis and meiosis).

Centromeres and telomeres

Centromeres are constricted regions where spindle fibers attach during cell division, pulling chromosomes to opposite poles. They're made up of repetitive DNA sequences.
Telomeres cap the ends of chromosomes. They protect the chromosome from degradation and prevent it from fusing with neighboring chromosomes. Like centromeres, telomeres consist of short repetitive sequences.

Both structures are essential for chromosome stability and accurate cell division.

Euchromatin vs. heterochromatin

Not all chromatin is packaged the same way:

Euchromatin is loosely packed and contains most of the actively transcribed genes. Think of it as the "open for business" portion of the genome.
Heterochromatin is tightly condensed, gene-poor, and rich in repetitive sequences.

Heterochromatin comes in two flavors. Constitutive heterochromatin stays condensed all the time (centromeres and telomeres fall into this category). Facultative heterochromatin can switch between condensed and open states depending on developmental stage or environmental signals, giving the cell a way to turn gene regions on or off.

Organellar genomes

Plant cells carry DNA in three compartments: the nucleus, mitochondria, and chloroplasts. The organellar genomes are much smaller than the nuclear genome and trace back to ancient endosymbiotic events, where free-living bacteria were engulfed by ancestral eukaryotic cells.

Mitochondrial DNA

Plant mitochondrial genomes range from about 200 to 2,000 kb, which is much larger and more variable than animal mitochondrial genomes (typically ~16 kb). They encode genes critical for cellular respiration, including components of the electron transport chain like cytochrome oxidase and NADH dehydrogenase. One notable feature: plant mitochondrial DNA mutates at a much lower rate than its animal counterpart.

Variation across species, Frontiers | Angiosperm-Wide and Family-Level Analyses of AP2/ERF Genes Reveal Differential ...

Chloroplast DNA

Chloroplast genomes are more compact (120–160 kb) and more consistent in size and structure across species. They carry genes for photosynthesis (photosystem I, photosystem II, RuBisCO large subunit) and other chloroplast functions. Because chloroplast DNA is highly conserved, it's a popular tool for plant phylogenetic studies.

Unique features vs. nuclear DNA

Organellar genomes differ from the nuclear genome in several important ways:

Shape: organellar genomes are circular; nuclear chromosomes are linear.
Copy number: each cell contains 1,000–10,000 copies of organellar DNA, but only 1–2 copies of the nuclear genome.
Inheritance: organellar genomes are maternally inherited in most angiosperms, while the nuclear genome comes from both parents.
Gene structure: organellar genes generally lack introns, whereas many nuclear genes contain them.

Gene structure and arrangement

Plant genes aren't just continuous stretches of coding DNA. They're built from coding regions, non-coding regions, and regulatory sequences that together control what protein is made, when, and how much.

Exons and introns

Exons are the coding portions of a gene. After transcription, exon sequences are retained in the mature mRNA and translated into protein.
Introns are non-coding sequences that sit between exons. They're transcribed into pre-mRNA but then spliced out during mRNA processing.

Introns aren't just junk, though. Their presence allows alternative splicing, where different combinations of exons are joined together. This means a single gene can produce multiple protein variants (isoforms), expanding the functional output of the genome.

Promoter regions

Promoters are regulatory DNA sequences located upstream (before) the transcription start site. They serve as the landing pad for transcription factors and RNA polymerase, which together initiate transcription. Two commonly conserved core promoter elements are the TATA box and the CAAT box, found across many eukaryotic genes.

Regulatory elements

Beyond promoters, several other elements fine-tune when and where a gene is expressed:

Enhancers boost transcription and can be located thousands of base pairs away from the gene they regulate. They work through DNA looping, physically contacting the promoter region.
Silencers repress transcription using a similar long-distance mechanism.
Insulators act as boundaries, preventing regulatory signals from one gene from spilling over and affecting a neighboring gene.

Repetitive DNA sequences

A large fraction of most plant genomes consists of sequences repeated many times over. These repetitive elements are a major reason plant genomes vary so much in size, and they play active roles in genome structure and evolution.

Tandem repeats

Tandem repeats are short sequences arranged in head-to-tail fashion, one copy right after the next. They come in two main size classes:

Microsatellites (also called SSRs): 1–6 bp repeat units (e.g., ATATAT...)
Minisatellites: 10–100 bp repeat units

Because the number of repeats at a given locus varies between individuals, tandem repeats are widely used as molecular markers in genetic mapping and population genetics.

Variation across species, Frontiers | Integrative analysis of physiology, biochemistry and transcriptome reveals the ...

Transposable elements

Transposable elements (TEs) are mobile DNA sequences that can move around and replicate within the genome. They fall into two classes based on how they move:

DNA transposons use a "cut-and-paste" mechanism: the element is excised from one location and inserted elsewhere.
Retrotransposons use a "copy-and-paste" mechanism: the element is transcribed into RNA, reverse-transcribed back into DNA, and then inserted at a new site. This means each transposition event adds a new copy.

TEs can alter gene expression by inserting near or within genes, and they can drive chromosomal rearrangements. They're a major force in genome evolution.

Proportion in plant genomes

The share of repetitive DNA differs enormously across species:

Arabidopsis thaliana: ~10% repetitive
Maize and wheat: >80% repetitive

In maize specifically, TEs make up over 75% of the entire genome. This massive accumulation of transposable elements is the primary reason some plant genomes are so large.

Polyploidy in plants

Polyploidy means having more than two complete sets of chromosomes. While rare in animals, it's remarkably common in plants. Estimates suggest that 30–80% of living plant species are polyploid, and it has been a powerful engine of plant evolution.

Mechanisms of formation

Polyploidy can arise through several routes:

Unreduced gametes: if meiosis fails to halve the chromosome number, a diploid gamete results. When this fertilizes a normal haploid gamete, the offspring is triploid; two unreduced gametes produce a tetraploid.
Somatic doubling: chromosome duplication in meristematic (actively dividing) cells can produce polyploid shoots or tissue sectors within an otherwise diploid plant.

These mechanisms feed into two distinct types of polyploidy (below).

Autopolyploidy vs. allopolyploidy

Autopolyploidy: all chromosome sets come from the same species. The genome simply duplicates. Example: tetraploid potato (Solanum tuberosum).
Allopolyploidy: chromosome sets come from two or more different species, combining hybridization with genome duplication. Example: bread wheat (Triticum aestivum) is a hexaploid containing genomes from three ancestral grass species.

Allopolyploids often show heterosis (hybrid vigor), displaying greater size, fertility, or adaptability than either diploid parent.

Evolutionary significance

Polyploidy has shaped plant evolution in several ways:

It creates instant reproductive isolation, which can lead to rapid speciation.
The extra gene copies provide raw material for evolving new functions (one copy maintains the original role while the duplicate is free to diverge).
Polyploids can colonize new ecological niches and tolerate environmental stress better, thanks to their increased genetic diversity and redundancy.

Many of the world's most important crops are polyploid: wheat (6x), cotton (4x), sugarcane (~12x), and coffee (4x). Plant breeders have also deliberately induced polyploidy to improve traits like fruit size and disease resistance.

Comparative genomics of plants

Comparative genomics compares genome sequences across species to uncover patterns of evolution, conserved functions, and structural changes. For plants, this field has revealed just how dynamic and reshuffled plant genomes are over time.

Synteny and collinearity

Synteny refers to the conservation of gene content on a shared chromosomal segment between species, even if the order has shifted somewhat.
Collinearity is more specific: it means the genes are in the same order and orientation.

Identifying syntenic and collinear blocks through comparative mapping reveals which parts of the genome have been preserved and which have been rearranged since two species diverged.

Genome duplication events

Whole-genome duplication (WGD) events have occurred repeatedly throughout plant evolutionary history. Most angiosperm lineages carry the signature of at least one ancient WGD (called palaeopolyploidy). These events expanded gene families and provided the genetic raw material for evolving novel traits, such as the diversification of floral structures.

Insights into plant evolution

Comparative genomics has shown that plant genome evolution is shaped by the interplay of polyploidy, transposable element activity, and gene loss. By comparing genomes across lineages, researchers have:

Identified conserved gene families and regulatory networks shared across flowering plants
Traced the molecular changes underlying crop domestication (e.g., mutations affecting fruit size in tomato or seed shattering in rice)
Reconstructed ancestral genome structures, revealing how modern chromosomes were assembled from ancient building blocks

2,589 studying →