Phonetics is the study of speech sounds: how we produce them, how they travel through the air, and how we hear and interpret them. It gives you the tools to describe and analyze the raw physical material of spoken language. The field breaks into three main branches.

Articulatory Phonetics

Articulatory phonetics focuses on how your body makes speech sounds. Every sound you produce involves coordinating your tongue, lips, teeth, vocal cords, and airflow from the lungs. By tracking where and how these organs move, linguists can categorize any speech sound.

Two key concepts organize this work:

Place of articulation identifies where in the vocal tract a sound is produced (lips, teeth, palate, etc.)
Manner of articulation describes how the airflow is shaped (fully blocked, partially restricted, etc.)

These two dimensions together let you precisely describe consonants across any language.

Acoustic Phonetics

Acoustic phonetics examines speech sounds as physical events, specifically as sound waves traveling through air. Researchers use tools like spectrograms (visual displays of sound frequencies over time) and waveforms to measure properties such as:

Frequency (perceived as pitch)
Amplitude (perceived as loudness)
Formants (resonant frequencies that distinguish vowels from one another)

This branch applies principles of physics to understand how speech is transmitted from speaker to listener.

Auditory Phonetics

Auditory phonetics investigates the receiving end: how the ear and brain process incoming speech sounds. It covers the perception of pitch, loudness, and timbre, as well as the neural pathways involved in decoding linguistic information. Factors like background noise and hearing impairments also fall under this branch, since they directly affect how well someone can comprehend speech.

International Phonetic Alphabet

The International Phonetic Alphabet (IPA) is a standardized set of symbols designed to represent every speech sound found in human languages. Each symbol corresponds to a distinct sound, making it possible to transcribe pronunciation accurately regardless of the language's writing system.

The IPA also includes diacritics, small marks added to symbols to indicate finer phonetic details like nasalization (letting air through the nose) or aspiration (a burst of air after a consonant). If you see a phonetic transcription in square brackets, like [pʰ], that's IPA at work.

Speech Sounds Classification

Linguists organize speech sounds into categories based on how they're produced and what they sound like acoustically. This classification system makes it possible to compare sound systems across languages and spot patterns in how sounds behave.

Consonants vs. Vowels

The most basic division in speech sounds separates consonants from vowels:

Consonants are produced by obstructing or constricting airflow somewhere in the vocal tract. Think of the way your lips close for a [b] or your tongue touches the roof of your mouth for a [t].
Vowels allow air to flow freely through the mouth, with the tongue and lips shaping the sound without blocking it.

Some sounds blur the line. Semivowels (like [w] and [j]) share characteristics of both categories: they involve minimal obstruction, similar to vowels, but function like consonants in syllable structure.

Place of Articulation

Place of articulation refers to where in the vocal tract a sound is produced. The main locations range from the front of the mouth to the throat:

Bilabial: both lips come together ([p], [b], [m])
Alveolar: the tongue touches or approaches the bony ridge just behind the upper teeth ([t], [d], [n], [s])
Velar: the back of the tongue rises toward the soft palate ([k], [g], [ŋ] as in "sing")

These are just three common examples; the full IPA chart maps many more positions, all the way back to the glottis (the space between the vocal cords).

Manner of Articulation

Manner of articulation describes how the airflow is modified:

Stops (or plosives): airflow is completely blocked, then released ([p], [t], [k])
Fricatives: airflow is forced through a narrow gap, creating turbulence ([f], [s], [ʃ] as in "ship")
Approximants: articulators come close together but don't create friction ([l], [r], [w])

Manner interacts with place of articulation and voicing to produce the full range of consonant sounds in a language.

Voicing and Aspiration

Voicing is whether your vocal cords vibrate during a sound. Hold your hand against your throat and say [z], then [s]. You'll feel vibration for [z] (voiced) but not for [s] (voiceless). This distinction separates pairs like [b]/[p], [d]/[t], and [g]/[k].

Aspiration is a puff of air that follows certain consonants. In English, the [p] in "pin" is aspirated [pʰ], but the [p] in "spin" is not. Whether aspiration changes word meaning depends on the language. Linguists measure aspiration through voice onset time (VOT), the delay between the release of a consonant and the start of vocal cord vibration.

Phonological Concepts

Phonology steps back from the physical details of sounds and asks: how do sounds function within a particular language's system? Where phonetics deals with the raw sounds, phonology deals with the mental categories and rules that organize them.

Phonemes vs. Allophones

A phoneme is a sound unit that can change the meaning of a word. In English, /p/ and /b/ are separate phonemes because swapping one for the other changes "pat" to "bat."

An allophone is a variant pronunciation of a phoneme that doesn't change meaning. The aspirated [pʰ] in "pin" and the unaspirated [p] in "spin" are both allophones of the phoneme /p/ in English. Native speakers usually don't even notice the difference.

Two key patterns help identify allophones:

Complementary distribution: the allophones appear in different, predictable environments (one never shows up where the other does)
Free variation: the allophones can be swapped in the same environment without changing meaning

Minimal Pairs

A minimal pair is two words that differ by exactly one sound in the same position and have different meanings. They're the primary tool for proving that two sounds are separate phonemes.

English: "pin" vs. "bin" proves /p/ and /b/ are distinct phonemes
French: "ton" (your) vs. "son" (his) proves /t/ and /s/ are distinct phonemes

If you can't find a minimal pair for two sounds, they might be allophones of the same phoneme rather than separate phonemes.

Distinctive Features

Linguists describe phonemes using distinctive features, which are binary (present or absent) properties based on articulation or acoustics. Examples include:

[±voiced]: whether the vocal cords vibrate
[±nasal]: whether air flows through the nose
[±high]: whether the tongue body is raised

These features allow efficient descriptions of sound classes and phonological rules. Instead of listing every affected sound individually, a rule can target all sounds that share a feature, like "all [+voiced] stops."

Syllable Structure

Phonemes group into syllables, which have up to three parts:

Onset: the consonant(s) before the vowel ("str" in "string")
Nucleus: the core, usually a vowel ("i" in "string")
Coda: the consonant(s) after the vowel ("ng" in "string")

Languages differ in what syllable shapes they allow. English permits complex onsets and codas (like "strengths"), while Japanese strongly favors open syllables (consonant + vowel, no coda). Syllable structure also influences stress placement and the overall rhythm of a language.

Articulatory phonetics, 2.2 Articulators – Essentials of Linguistics

Phonological Processes

Phonological processes are systematic sound changes that happen in specific contexts during natural speech. They explain why the pronunciation of a word or morpheme can shift depending on what sounds surround it.

Assimilation and Dissimilation

Assimilation occurs when a sound becomes more like a neighboring sound. The prefix "in-" becomes "im-" before bilabial consonants like [p] and [b], so you get "impossible" rather than "inpossible." The [n] assimilates to [m] because [m] matches the bilabial place of the following [p]. Assimilation can work forward (a sound influences what follows) or backward (a sound influences what precedes it).

Dissimilation is the opposite: a sound becomes less like a nearby sound to make the two easier to distinguish. Many English speakers pronounce "February" as "Febuary," dropping one [r] because having two [r] sounds close together is awkward.

Epenthesis and Deletion

Epenthesis inserts a sound that wasn't originally there, usually to break up a difficult consonant cluster. Many speakers pronounce "athlete" as [æθəlit], inserting a vowel between [θ] and [l].

Deletion removes a sound to simplify pronunciation. In "fifth," many speakers drop the second [f], producing [fɪθ] instead of [fɪfθ].

Both processes reflect a tendency toward easier, more fluid articulation.

Metathesis and Coalescence

Metathesis is the reordering of sounds within a word. In some English dialects, "ask" is pronounced "aks," with the [s] and [k] swapped. This can happen as a historical change or as an ongoing pattern in certain varieties.

Coalescence merges two adjacent sounds into a single new sound. When "did you" becomes [dɪdʒu], the [d] and [j] fuse into [dʒ]. This often produces sounds that function as new phonemes or allophones in the language.

Vowel Harmony

Vowel harmony is a system where all vowels in a word must share certain features, such as backness or rounding. When you add a suffix, the vowel in that suffix changes to match the vowels already in the word.

Turkish is a classic example: the plural suffix is "-ler" after words with front vowels (like "ev-ler," houses) but "-lar" after words with back vowels (like "at-lar," horses). Hungarian and Finnish show similar patterns. Vowel harmony is less common in European languages but widespread in Turkic, Uralic, and some African language families.

Suprasegmental Features

Suprasegmental features are aspects of speech that stretch across individual sounds, shaping the rhythm, melody, and emphasis of spoken language. They play a major role in conveying meaning and emotion.

Stress and Intonation

Stress is the relative prominence given to certain syllables. A stressed syllable is typically louder, higher in pitch, or longer than unstressed ones. In English, stress can distinguish word meanings: "REcord" (noun) vs. "reCORD" (verb).

Intonation refers to pitch changes across an entire sentence. Rising intonation at the end of a sentence often signals a question in English, while falling intonation signals a statement. Intonation also conveys attitudes like surprise, sarcasm, or uncertainty.

Tone and Pitch

In tonal languages, pitch differences on individual syllables change word meaning. Mandarin Chinese has four tones: a high level tone, a rising tone, a falling-rising tone, and a falling tone. The syllable "ma" means "mother," "hemp," "horse," or "scold" depending on the tone.

Yoruba, spoken in West Africa, also uses tone to distinguish meaning. Tonal languages require speakers to manage both sentence-level intonation and word-level tone simultaneously, which creates complex pitch patterns.

Length and Duration

In some languages, how long you hold a sound changes the word's meaning. Finnish distinguishes short and long vowels: "tuli" (fire) vs. "tuuli" (wind). Japanese distinguishes short and long consonants: "kite" (come) vs. "kitte" (stamp).

Duration is the physical measurement of sound length, while length is the linguistic category (short vs. long). Speaking rate, emphasis, and surrounding sounds all affect duration in practice.

Rhythm and Timing

Languages fall into broad rhythmic categories:

Stress-timed (e.g., English, German): stressed syllables occur at roughly regular intervals, and unstressed syllables get compressed
Syllable-timed (e.g., Spanish, French): each syllable takes roughly the same amount of time
Mora-timed (e.g., Japanese): each mora (a unit smaller than a syllable) takes roughly equal time

These rhythmic differences contribute to what people perceive as the characteristic "melody" of a language and play a role in how foreign accents are perceived.

Phonological Analysis

Phonological analysis is the process of uncovering the rules and patterns that govern a language's sound system. It moves from raw phonetic data to abstract generalizations about how sounds behave.

Phonemic Analysis

The goal of phonemic analysis is to determine which sounds are contrastive (phonemes) in a given language. The process typically follows these steps:

Collect phonetic data from native speakers
Look for minimal pairs to identify contrastive sounds
For sounds that lack minimal pairs, check whether they're in complementary distribution (allophones) or free variation
Establish the full phoneme inventory of the language

The result is a systematic map of the language's sound contrasts.

Rule Formulation

Once you've identified phonological patterns, you can express them as formal rules. A phonological rule specifies:

What changes (the target sound)
How it changes (the resulting sound)
Where it changes (the phonological environment)

For example, a rule might state: "voiceless stops become aspirated at the beginning of a stressed syllable." Rules use distinctive features to capture generalizations, and when multiple rules apply, their ordering can matter for the final output.

Phonological Alternations

Phonological alternations are cases where the same morpheme is pronounced differently depending on its phonological context. The English plural suffix is a clear example:

[s] after voiceless sounds: "cats" [kæts]
[z] after voiced sounds: "dogs" [dɑɡz]
[əz] after sibilants: "buses" [bʌsəz]

These alternations often reflect historical sound changes that became regular patterns in the grammar.

Articulatory phonetics, Manner of Articulations

Underlying Representations

An underlying representation is the abstract mental form of a word before any phonological rules apply. The idea is that speakers store one base form in memory, and phonological rules derive the different surface pronunciations.

For the English plural, the underlying form of the suffix might be /z/, with rules converting it to [s] or [əz] in the right environments. This approach allows more efficient mental storage, since you don't need to memorize every variant separately. The psychological reality of underlying representations remains an active area of debate in linguistics.

Cross-Linguistic Phonology

Comparing sound systems across languages reveals both universal tendencies and striking diversity. This comparative perspective helps linguists understand what's common to all human languages and what varies.

Phonological Universals

Certain patterns show up in virtually all languages:

Every known language has both stop consonants and vowels
Most languages have at least the vowels /i/, /a/, and /u/
Nasal consonants appear in nearly all languages

These universals likely reflect the physical constraints of human vocal tracts and the perceptual abilities of human hearing. Some universals are absolute (true of every language), while others are strong statistical tendencies.

Language-Specific Sound Systems

Each language combines phonemes and rules in its own way. Phoneme inventories range from around 11 sounds (Rotokas, spoken in Papua New Guinea) to over 100 (some Khoisan languages of southern Africa, which include click consonants). Caucasian languages like Georgian feature ejective consonants, produced with a glottalic airstream that's rare in European languages.

These differences reflect centuries of historical development and contact with neighboring languages.

Phonotactic Constraints

Phonotactics are the rules governing which sound sequences are allowed in a language. English, for instance, permits complex consonant clusters like [str] at the start of a word ("string") but doesn't allow [ŋ] (the "ng" sound) in word-initial position. Japanese requires most syllables to be open (ending in a vowel), so consonant clusters are heavily restricted.

These constraints shape how speakers perceive and produce unfamiliar words, and they directly influence how loanwords get adapted.

Loanword Adaptation

When a language borrows a word, speakers adjust it to fit their own phonotactic rules. Japanese borrowed the English word "strike" but reshaped it as "sutoraiku" [sɯtoɾaikɯ], inserting vowels to break up consonant clusters that Japanese doesn't allow.

This process reveals how deeply phonotactic constraints are embedded in speakers' knowledge. Sometimes, frequent borrowing can even introduce new sounds into a language's phoneme inventory over time.

Applications of Phonetics

Phonetic and phonological knowledge extends well beyond linguistics classrooms. Understanding how speech sounds work has practical value in technology, law, healthcare, and education.

Speech Recognition Technology

Voice assistants like Siri and Alexa rely on acoustic phonetics principles. These systems convert spoken language into text using machine learning algorithms trained on massive collections of recorded speech. The technology must account for coarticulation (how sounds blend into each other), regional accents, background noise, and individual speaker differences. Applications include transcription services, voice-controlled devices, and accessibility tools for people with disabilities.

Forensic Phonetics

Forensic phonetics applies voice analysis to legal cases. Analysts compare voice samples to help identify speakers, examining features like accent, speech rate, and idiosyncratic pronunciation habits. Spectrographic analysis and statistical methods support these comparisons. Challenges include disguised voices, poor recording quality, and the need to meet legal standards of evidence.

Speech Therapy

Speech-language pathologists use articulatory phonetics to diagnose and treat speech disorders. Understanding exactly how sounds are produced helps therapists pinpoint what a client is doing differently and design targeted exercises. Acoustic analysis tools can track progress over time. Therapists also draw on knowledge of phonological development to assess whether a child's speech patterns are age-appropriate.

Language Teaching

Phonetics informs second-language pronunciation instruction. Contrastive analysis compares the sound systems of a learner's native language and their target language to predict which sounds will be difficult. A Japanese speaker learning English, for example, will likely struggle with the /l/ vs. /r/ distinction, since Japanese doesn't contrast these sounds.

Teaching techniques include minimal pair drills, phonetic transcription practice, and explicit instruction on unfamiliar articulatory positions.

Historical Phonology

Historical phonology traces how sound systems change over time. These changes explain many of the irregularities and patterns found in modern languages.

Sound Change Over Time

Sound changes are systematic shifts in pronunciation that spread gradually through a speech community. They tend to follow predictable patterns:

Lenition: sounds weaken (stops become fricatives, fricatives become approximants)
Fortition: sounds strengthen (the reverse of lenition)
Assimilation: sounds become more like their neighbors

The Great Vowel Shift in English (roughly 1400-1700) dramatically changed the pronunciation of long vowels, which is why English spelling often doesn't match modern pronunciation. Celtic languages underwent systematic consonant mutations that still function as grammatical markers today.

Comparative Method

The comparative method is the primary technique for establishing that languages are historically related. It works by:

Identifying cognates (words in different languages that descend from a common ancestor)
Finding regular sound correspondences between those cognates
Using those correspondences to reconstruct the sounds of the ancestor language

For example, regular correspondences between Latin, Greek, and Sanskrit helped linguists reconstruct Proto-Indo-European. The key principle is regularity: genuine historical relationships produce systematic, predictable sound correspondences, not random similarities.

Reconstruction of Proto-Languages

Proto-language reconstruction uses the comparative method to infer what an ancestral language sounded like. Linguists propose phoneme inventories, phonological rules, and even basic vocabulary for languages that were never written down.

Major reconstructed proto-languages include Proto-Indo-European (ancestor of English, Hindi, Russian, Greek, and many others), Proto-Austronesian (ancestor of Malay, Hawaiian, Malagasy, and others), and Proto-Bantu (ancestor of Swahili, Zulu, and hundreds of other African languages). These reconstructions are always hypotheses, refined as new evidence and methods emerge.

Phonological Evolution

Phonological evolution traces how an entire sound system develops from an earlier stage to its modern form. Linguists examine what drives these changes:

Articulatory ease: speakers tend to simplify difficult sequences over time
Perceptual distinctiveness: sounds that are too similar may shift apart to stay distinguishable
Analogy: speakers sometimes regularize irregular forms based on more common patterns

Changes can spread through the vocabulary gradually (lexical diffusion) or apply across the board as regular sound laws. Understanding phonological evolution explains many of the quirks and exceptions in modern languages that would otherwise seem arbitrary.

2,589 studying →