๐ŸพGeneral Biology II

Key Concepts in Phylogenetic Tree Construction

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Phylogenetic trees show how life diversified from common ancestors into the millions of species we see today. They're central to evolutionary biology, and in General Biology II you'll need to read, interpret, and construct these trees. That means understanding not just what they show, but how scientists build them and why certain methods produce more reliable results.

These concepts connect directly to core principles you'll encounter throughout the course: common ancestry, natural selection, genetic variation, and molecular evidence for evolution. When you see a phylogenetic tree on an exam, don't just memorize branch patterns. Know what each node represents, why branch lengths matter, and how scientists distinguish true evolutionary relationships from misleading similarities.


Tree Structure and Terminology

Before you can analyze evolutionary relationships, you need to understand the basic architecture of phylogenetic trees. Every element carries specific evolutionary meaning.

Definition of Phylogenetic Trees

A phylogenetic tree is a graphical representation of evolutionary relationships among organisms. Each branching point shows where lineages diverged from a common ancestor.

  • Tips (terminal nodes) represent current species or groups being compared
  • Internal nodes represent hypothetical common ancestors
  • The whole tree visualizes descent with modification, the core mechanism underlying evolutionary biology

Rooted vs. Unrooted Trees

  • Rooted trees show directionality. They indicate which ancestor came first and how evolution proceeded forward in time.
  • Unrooted trees display relationships without specifying a common ancestor. They show only how closely related groups are to each other.
  • Rooted trees require an outgroup to establish the base, making them more informative but harder to construct accurately.

Cladograms vs. Phylograms

  • Cladograms focus purely on branching order. Branch lengths are meaningless; they only show who is related to whom.
  • Phylograms encode additional data through branch lengths, representing either amount of evolutionary change or time since divergence.
  • Choose based on your question: cladograms for relationships, phylograms for understanding rates of change or divergence timing.

Compare: Rooted trees and cladograms both show branching patterns, but rooted trees specify direction while cladograms may or may not be rooted. If a question asks you to "trace the evolutionary history," you need a rooted tree. If it asks "which species are most closely related," either type works.


Grouping Species Correctly

One of the biggest challenges in phylogenetics is making sure your groups actually reflect evolutionary history. A valid phylogenetic group must be defined by shared ancestry, not superficial similarity.

Monophyletic, Paraphyletic, and Polyphyletic Groups

  • Monophyletic groups (clades) include an ancestor and all of its descendants. These are the only truly valid evolutionary groups.
  • Paraphyletic groups include an ancestor but exclude some descendants. "Reptilia" is the classic example: it excludes birds, even though birds evolved from reptilian ancestors. These groups are taxonomically convenient but evolutionarily incomplete.
  • Polyphyletic groups lump together organisms from different ancestors based on convergent traits. These are errors that misrepresent evolutionary history.

Outgroups and Their Importance

An outgroup is a reference species that diverged before the group you're studying. It provides a baseline for comparison.

  • Without an outgroup, you can't determine which traits are ancestral versus derived.
  • Choose outgroups carefully: they must be related enough to compare meaningfully but divergent enough to clearly fall outside your study group. For a tree of mammals, a reptile like a lizard could serve as an outgroup.

Character States: Ancestral vs. Derived

  • Ancestral (plesiomorphic) traits were present in the common ancestor. They don't tell you about relationships within the group because everyone shares them.
  • Derived (apomorphic) traits evolved after divergence. These are the key to identifying shared derived characters (synapomorphies).
  • Only synapomorphies define clades. Shared ancestral traits can mislead you into grouping unrelated lineages together.

Compare: Monophyletic and paraphyletic groups both include a common ancestor, but monophyletic groups include all descendants while paraphyletic groups arbitrarily exclude some. Classic example: "Reptilia" is paraphyletic because it excludes birds, which evolved from reptilian ancestors.


Distinguishing True Relationships from False Signals

Not all similarities indicate common ancestry. Convergent evolution can produce strikingly similar traits in unrelated species, creating traps for phylogenetic analysis.

Homology vs. Homoplasy

  • Homologous traits share a common evolutionary origin. The forelimbs of bats, whales, and humans are homologous: they have the same underlying bone structure despite serving very different functions.
  • Homoplastic traits evolved independently through convergent evolution, parallel evolution, or evolutionary reversal. Wings in bats and birds are homoplastic because flight evolved separately in each lineage.
  • Distinguishing these is critical. Homology supports phylogenetic grouping, while homoplasy creates false signals that must be identified and excluded.

Reading and Interpreting Phylogenetic Trees

  • Nodes represent speciation events: the point where one ancestral population split into two distinct lineages.
  • Sister groups share an immediate common ancestor and are each other's closest relatives, regardless of how different they look today.
  • Tree rotation doesn't change relationships. Branches can spin around nodes without altering evolutionary meaning, so focus on connections, not left-right position on the page.

Compare: Homology and homoplasy both produce similar-looking structures, but homology reflects shared ancestry while homoplasy reflects similar selective pressures acting on unrelated lineages. For questions about convergent evolution, homoplasy is your go-to example. For questions about evidence for common descent, use homology.


Methods for Building Trees

Scientists use different analytical approaches depending on their data and goals. Each method makes different assumptions and has distinct strengths and limitations.

Parsimony Principle

  • Favors the simplest explanation. The tree requiring the fewest evolutionary changes is considered most likely correct.
  • This is based on Occam's Razor: don't assume more evolutionary events than necessary to explain the data.
  • Limitation: parsimony can be misled when evolution isn't simple. Rapid evolution or widespread convergence can make the simplest tree the wrong tree.

Maximum Likelihood Method

  • A statistical approach that calculates the probability of observing your data given each possible tree topology.
  • It selects the tree with the highest probability. This is more computationally intensive than parsimony but handles complex evolutionary scenarios better.
  • It requires a model of evolution specifying how characters change over time, so results depend on how accurate that model is.

Bayesian Inference

  • Incorporates prior knowledge about evolutionary processes and updates probabilities as data is analyzed.
  • Produces posterior probabilities for each possible tree, giving you a measure of confidence rather than a single "best" tree.
  • Handles uncertainty explicitly, which is particularly valuable when data is limited or conflicting signals exist.

Compare: Parsimony minimizes total changes while maximum likelihood maximizes statistical probability. Parsimony is faster and more intuitive; maximum likelihood is more rigorous for molecular data. Molecular datasets typically favor likelihood approaches because DNA evolution often involves complexities (like varying mutation rates) that parsimony doesn't account for.


Assessing Confidence and Timing

Building a tree is only half the work. You also need to know how much to trust it and what it tells you about when evolutionary events occurred.

Bootstrap Analysis

Bootstrap analysis is a resampling technique that tests tree reliability. It works by randomly sampling your dataset with replacement hundreds or thousands of times, then checking how often each branch appears across all those resampled trees.

  • Bootstrap values indicate what percentage of resampled datasets support a given branch. Values above 70% are generally considered strong support.
  • Bootstrap analysis does not prove a tree is correct. It only shows whether your data consistently supports the same relationships.

Molecular Clock Hypothesis

The molecular clock hypothesis assumes mutations accumulate at roughly constant rates over time. If true, the number of genetic differences between two species reflects how long ago they diverged.

  • This allows dating of evolutionary events by calibrating mutation rates against known fossil dates.
  • Major limitation: mutation rates actually vary across lineages and genes. Because of this, relaxed molecular clocks that allow rate variation are now preferred over strict clocks.

Interpreting Branch Lengths

  • In phylograms, branch length = evolutionary change. Longer branches indicate more mutations or more time, depending on the analysis.
  • Short branches suggest rapid diversification or recent divergence. Unusually long branches may indicate long-branch attraction, an artifact where distantly related species with fast evolution rates get incorrectly grouped together.
  • Always check what branch lengths represent in a given tree. Time and genetic change are not the same thing.

Compare: Bootstrap values and Bayesian posterior probabilities both measure confidence, but they come from different procedures. Bootstrap values come from resampling; posteriors come from probability calculations. Bayesian posteriors tend to be numerically higher for the same data, so don't directly compare numbers across methods.


Constructing Trees from Data

Understanding how raw data becomes a phylogenetic tree helps you critically evaluate the trees you encounter on exams and in research.

Constructing Phylogenetic Trees from Character Matrices

A character matrix organizes trait data in rows (species) and columns (characters), with each cell containing a character state (e.g., present/absent, or a specific nucleotide).

  • Molecular data (DNA sequences, protein sequences) now dominates phylogenetics because it provides abundant, objective characters that can be compared across very different organisms.
  • Character selection matters enormously. Choosing uninformative or homoplastic characters produces unreliable trees regardless of which method you use.

Quick Reference Table

ConceptBest Examples
Tree typesRooted vs. unrooted, cladograms vs. phylograms
Valid groupingsMonophyletic groups (clades), synapomorphies
Invalid groupingsParaphyletic groups, polyphyletic groups, homoplasy
Tree-building methodsParsimony, maximum likelihood, Bayesian inference
Confidence assessmentBootstrap analysis, posterior probabilities
Temporal analysisMolecular clock, branch length interpretation
Character analysisAncestral vs. derived states, homology vs. homoplasy
Data organizationCharacter matrices, outgroup selection

Self-Check Questions

  1. What is the key difference between a monophyletic group and a paraphyletic group, and why does this distinction matter for accurate classification?

  2. You're given a phylogenetic tree where two distantly related species both have wings. How would you determine whether this similarity represents homology or homoplasy, and what would each conclusion imply?

  3. Compare parsimony and maximum likelihood as tree-building methods. Under what circumstances might they produce different results, and which would you trust more for a large molecular dataset?

  4. A branch on a phylogenetic tree has a bootstrap value of 95%. What does this tell you, and what does it not tell you about the evolutionary relationship shown?

  5. If you were constructing a phylogenetic tree for a group of mammals and needed to select an outgroup, what criteria would guide your choice, and why is this step essential for interpreting the tree correctly?

Key Concepts in Phylogenetic Tree Construction to Know for General Biology II