Speech production starts with air from the lungs and ends with sound shaped by structures in your mouth and nose. Every speech sound you make depends on a chain of anatomical parts working together. Articulatory phonetics is the study of how those parts move and interact to produce language sounds.

Anatomy of the Human Vocal Tract

Think of the vocal tract as a tube running from your lungs up through your throat and out your mouth (and sometimes your nose). Each section of that tube plays a different role.

The power source:

Lungs generate the airflow that powers speech. Without moving air, there's no sound.
Diaphragm contracts and relaxes to control lung volume, pushing air upward.
Trachea (your windpipe) connects the lungs to the larynx above.

The sound source:

Larynx sits at the top of the trachea and houses the vocal folds. When the vocal folds vibrate, they produce voiced sounds.
Glottis is the space between the vocal folds. It can open wide (for voiceless sounds), close tightly (for a glottal stop), or vibrate (for voicing).

The resonating and shaping chambers:

Pharynx is the throat cavity just above the larynx. It acts as a resonating chamber that modifies sound quality.
Oral cavity is the main space where speech sounds get shaped. It contains several key structures:
- Tongue is the most mobile articulator. It can move forward, backward, up, and down to form all kinds of constrictions.
- Hard palate is the rigid roof of your mouth.
- Soft palate (velum) is the fleshy part behind the hard palate. It can raise to block airflow into the nose or lower to let air through for nasal sounds.
- Alveolar ridge is the bony ridge just behind your upper teeth. Touch it with your tongue and you'll feel it.
- Teeth help produce certain sounds, especially fricatives like [f] and [θ].
- Lips shape the opening of the mouth and can round, spread, or close completely.
Nasal cavity adds resonance when the velum is lowered, producing nasal sounds like [m], [n], and [ŋ].

Active vs. passive articulators:

This distinction matters because it's how linguists describe where and how sounds are made.

Active articulators are the parts that move during speech: the tongue, lips, and soft palate.
Passive articulators stay in place and serve as targets that active articulators move toward: the teeth, hard palate, and alveolar ridge.

For example, when you say [t], your tongue tip (active) touches the alveolar ridge (passive).

Anatomy of human vocal tract, Vocal folds - wikidoc

Roles in Speech Sound Production

Each major structure contributes something specific:

Lungs provide the airstream and control subglottal pressure, which affects how loud and (to some extent) how high-pitched your speech is.
Larynx controls voicing and pitch. The vocal folds vibrate to produce voiced sounds, and adjusting their tension and length changes pitch: higher tension produces a higher pitch.
Articulators shape the vocal tract into different configurations to produce distinct sounds. They can form complete closures (producing stops like [p] or [t]), narrow constrictions (producing fricatives like [s] or [f]), or slight narrowings (producing approximants like [w] or [j]). Combining a stop with a fricative in quick sequence gives you an affricate, like the [tʃ] at the start of "church."

Anatomy of human vocal tract, Organs and Structures of the Respiratory System | Anatomy and Physiology

Speech Production Mechanisms

Process of Phonation

Phonation is the production of sound by vibrating the vocal folds. Here's how it works:

Air pressure builds up below the closed vocal folds (subglottal pressure).
The pressure forces the vocal folds apart, letting a burst of air through.
As air rushes through the narrow glottal opening, the Bernoulli effect creates a drop in pressure that sucks the vocal folds back together.
Steps 2–3 repeat rapidly, creating a vibration cycle that produces a buzzing sound wave.

This cycle can happen hundreds of times per second. The resulting buzz is the raw material that the rest of the vocal tract then shapes into recognizable speech sounds.

Voicing states:

Voiced sounds have vibrating vocal folds. All vowels are voiced (in most languages), along with consonants like [z], [v], [b], and [d].
Voiceless sounds are produced with the vocal folds held apart so they don't vibrate: [s], [f], [p], [t].
Breathy voice falls in between, with the vocal folds only partially closed so that some turbulent airflow escapes alongside the vibration.

You can feel the difference by placing your fingers on your throat and alternating between [zzzzz] (voiced) and [sssss] (voiceless).

Modes of phonation affect voice quality:

Modal voice is your normal speaking voice, with regular vocal fold vibration.
Falsetto uses a different vibration pattern where only the edges of the vocal folds vibrate, producing a higher pitch.
Creaky voice (also called vocal fry) involves slow, irregular vocal fold vibration at a very low frequency.

Types of Airstream Mechanisms

Not all speech sounds are made the same way. The airstream mechanism describes what moves the air and which direction it flows.

Pulmonic airstream mechanism:

This is by far the most common. The lungs push air outward (egressive) through the vocal tract. Nearly every sound in every language uses this mechanism. Pulmonic ingressive airflow (breathing in while speaking) is extremely rare and mostly limited to things like the "yeah" some people produce while inhaling.

Glottalic airstream mechanism:

Here, the glottis itself acts as the air pump instead of the lungs.

Ejectives are glottalic egressive sounds. The glottis closes, traps air above it, then the larynx moves upward to compress that air. When the oral closure releases, you get a sharp, popping burst. Languages like Amharic and Quechua use ejectives.
Implosives are glottalic ingressive sounds. The glottis closes and the larynx moves downward, rarefying the air above it and creating a slight inward airflow. Sindhi and Swahili both have implosive consonants.

Velaric airstream mechanism:

The back of the tongue seals against the velum, and the front of the tongue creates another closure further forward. The tongue body then moves to rarefy the air between those two closures. When the front closure releases, outside air rushes in, producing a click. Clicks are velaric ingressive sounds found prominently in southern African languages like Zulu and Xhosa. English speakers actually use a click-like sound informally: the "tsk-tsk" of disapproval.

Comparing the three mechanisms: Pulmonic sounds appear in all known languages. Glottalic and velaric sounds are found in specific language families and regions. They differ not just in which muscles drive the airflow, but also in their acoustic properties: ejectives tend to be louder and more abrupt, while clicks have distinctive sharp bursts of noise.

2,589 studying →