The comparative method is a step-by-step procedure for figuring out what an ancestor language (a "proto-language") sounded like, even when no written records of it survive. It works because language change isn't random: sounds tend to shift in regular, predictable ways across all the words in a language.

Three core assumptions make the method possible:

Genetic relationship: the languages being compared descend from a single common ancestor.
Regular sound correspondences: when a sound changed in one language, it changed the same way in (nearly) every word where that sound appeared.
Systematic change over time: language change follows patterns, so those patterns can be traced backward.

The method itself involves four main steps:

Identify cognates across related languages. Cognates are words inherited from the same ancestor word. For example, English water, German Wasser, and Swedish vatten all trace back to a single Proto-Germanic form.
Establish sound correspondences by lining up cognates and noting which sounds match up across languages.
Reconstruct proto-forms, the hypothesized words in the ancestor language, based on those correspondences.
Formulate sound change rules that explain how the proto-forms evolved into the forms we see today.

A few related concepts are worth keeping straight:

Genetic relationship vs. borrowing: A cognate is inherited from the ancestor language. A loanword is borrowed from another language through contact. The Japanese word pan ("bread") comes from Portuguese, not from a shared ancestor. The comparative method focuses on inherited words and needs to filter out borrowings.
Internal reconstruction: A technique that looks at patterns within a single language (like irregular verb forms) to infer earlier stages, rather than comparing across languages.
Subgrouping: Classifying languages into branches of a family based on shared innovations, changes that a subset of languages went through together after splitting from the larger group.

Principles of comparative method, Comparative method - Wikipedia

Application of proto-form reconstruction

Reconstructing a proto-form is where the method gets concrete. Here's how it works in practice:

Compile cognate sets from the related languages you're comparing. You want words with the same meaning and similar-enough sounds to be plausible cognates.
Analyze sound correspondences across those sets. For each position in the word, note which sound appears in each language. If Language A consistently has p where Language B has f, that's a regular correspondence.
Propose a proto-form that best explains the pattern. Linguists typically choose the sound that requires the fewest or most natural changes to produce the attested forms. For instance, if most daughter languages show p and only one shows f, the proto-form likely had $p$ , with one language undergoing the shift $p \rightarrow f$ .
Refine reconstructions as more cognate sets and data become available. New evidence can confirm or revise earlier proposals.

Not every cognate set is perfectly regular. Linguists also account for sporadic changes (one-off shifts that didn't apply across the board) and exceptions to the general rules. These are noted but don't invalidate the regular patterns.

The comparative method has been applied successfully across many language families:

Indo-European (English, Spanish, Hindi, Russian, Greek, etc.), the most extensively reconstructed family
Austronesian (Malay, Tagalog, Hawaiian, Maori)
Sino-Tibetan (Mandarin, Burmese, Tibetan)

Principles of comparative method, Jaker | On the historical source of a ~ u alternations in Dëne Sųłıné optative paradigms ...

Sound Correspondences and Limitations

Sound correspondences for genetic relationships

Sound correspondences are the backbone of the comparative method. A correspondence is "regular" when it shows up consistently across many word sets, not just one or two. Two key features define regular correspondences:

Consistency: The same sound-to-sound match appears in word after word. English f corresponds to Latin p in father/pater, foot/ped-, fish/piscis, and so on.
Predictability by environment: Sometimes a correspondence holds only in certain phonetic contexts (e.g., only at the beginning of a word, or only before a vowel). Tracking these environments is part of the analysis.

Identifying correspondences follows a systematic process:

Arrange cognate sets so that corresponding sounds are aligned across languages.
For each position, record which sound appears in each language.
Look for patterns: which correspondences recur, and in which phonetic environments.

Once you have regular correspondences, you can use them to establish genetic relationships. The key distinction is between shared innovations (changes that happened in a subgroup after it split off) and shared retentions (features kept from the ancestor that don't tell you much about subgrouping). You also need to rule out chance resemblances (unrelated words that happen to sound alike) and borrowings (words transferred through contact, not inheritance).

Limitations in language reconstruction

The comparative method is powerful, but it has real boundaries.

Time depth: Reconstruction becomes unreliable beyond roughly 8,000 to 10,000 years. Over that span, sound changes accumulate to the point where cognates are no longer recognizable, and the evidence simply erodes.

Data limitations: Many of the world's languages were never written down, or their written records are recent and sparse. When you're missing data from key languages in a family, your reconstructions may be incomplete or skewed.

Methodological challenges:

Semantic shift makes meaning reconstruction tricky. A word might mean "deer" in one language and "animal" in another, so pinning down the proto-meaning requires careful comparison.
Analogy can reshape words to fit other patterns in the language (like English speakers saying "dived" instead of older "dove"), creating forms that look irregular from a comparative standpoint.
Loanwords can mimic cognates if languages were in prolonged contact, and distinguishing inherited forms from borrowed ones takes careful work.

Theoretical limitations: The method assumes a neat family-tree model where one ancestor splits into distinct daughter languages. In reality, languages sometimes exist on dialect continua or undergo heavy mixing, making the single-ancestor model an imperfect fit. Additionally, while the method works well for reconstructing sounds and words, reconstructing syntax and morphology is much harder because these features don't leave the same kind of regular, traceable correspondences.

Language contact and areal features: When unrelated languages are spoken near each other for long periods, they can converge in structure. The languages of the Balkan Sprachbund, for example, share grammatical features (like a postposed definite article) not because of common ancestry but because of centuries of contact. These areal features can muddy the picture when you're trying to determine genetic relationships.

2,589 studying →