Intro to Linguistics

2.2 Acoustic and auditory phonetics

Citation:

Sound waves are the foundation of speech and language. These vibrations travel through air, carrying information about frequency, amplitude, and wavelength. Understanding these properties is crucial for grasping how we produce and perceive speech.

Speech production and perception involve complex processes. From the vocal cords to the brain, our bodies work together to create and interpret sounds. Tools like waveforms and spectrograms help linguists analyze these intricate acoustic patterns in speech.

Acoustic Properties of Sound

Properties of sound waves

Sound waves propagate as longitudinal waves through a medium (typically air) caused by vibrations of a sound source (vocal cords)
Frequency measures wave cycles per second in Hertz (Hz) determining pitch perception (low frequencies = low pitch)
Amplitude refers to maximum wave displacement from rest position relating to sound intensity or loudness measured in decibels (dB)
Wavelength represents distance between consecutive wave peaks or troughs measured in meters (m) inversely proportional to frequency
Speed of sound equation $c = f \lambda$ where c ≈ 343 m/s in air at 20℃, f is frequency, λ is wavelength

Speech Production and Perception

Articulatory vs acoustic characteristics

Vowels characterized by formants (resonant frequencies) with F1 related to tongue height and F2 to tongue advancement (high F1 = low tongue position)
Consonants produce distinct acoustic patterns: stops create brief silence then burst, fricatives generate turbulent noise, nasals show anti-formants
Voice onset time (VOT) measures delay between stop release and vocal fold vibration distinguishing voiced from voiceless stops (longer VOT = voiceless)
Source-filter theory explains speech production: larynx as sound source, vocal tract as modifying filter
Coarticulation causes overlapping articulatory gestures resulting in acoustic transitions between adjacent sounds (anticipatory or carryover)

Process of speech perception

Outer ear collects and funnels sound waves, ear canal resonates specific frequencies
Middle ear transmits vibrations via ossicles (malleus, incus, stapes) amplifying signals
Inner ear's cochlea converts mechanical vibrations to electrical signals, basilar membrane organized tonotopically
Auditory nerve transmits signals to brain for processing in auditory cortex
Categorical perception allows phoneme distinction despite acoustic variations (ba vs pa)
Top-down processing utilizes context and linguistic knowledge in speech understanding
McGurk effect demonstrates visual information's influence on auditory perception (seeing "ga" while hearing "ba" results in perceiving "da")

Tools for acoustic analysis

Waveforms display time-domain representation with amplitude vs time useful for syllable boundaries and VOT
Spectrograms show time-frequency representation with intensity as color/darkness revealing formants and transitions
Formant analysis tracks frequencies to identify vowels and analyze quality (plotting F1 vs F2)
VOT measurement identifies time between stop release and voicing onset (positive VOT for aspirated stops)
Pitch tracking analyzes fundamental frequency (F0) contours for intonation and stress patterns
Intensity analysis measures relative loudness of speech segments (stress and prominence)
Praat software offers various analysis and visualization options for speech research (formant tracking, pitch analysis)

Key Terms to Review (25)

Vowel: A vowel is a speech sound produced without any significant constriction or blockage of airflow in the vocal tract, allowing it to resonate freely. Vowels serve as the nucleus of syllables and play a crucial role in differentiating meaning between words. They are characterized by specific features such as tongue height, tongue position, and lip rounding, which help in classifying them into different categories.

Consonant: A consonant is a speech sound produced when airflow is obstructed in some way during articulation, contrasting with vowels which are produced with an open vocal tract. Consonants can be classified by their place of articulation, manner of articulation, and voicing, which all play essential roles in the structure and function of language. Understanding consonants is crucial in various linguistic branches, including phonetics and phonology, where they examine how these sounds are produced and perceived.

Intensity Analysis: Intensity analysis refers to the examination of the strength or amplitude of sound waves in relation to speech sounds. This approach is crucial in understanding how variations in intensity can influence the perception of speech, including aspects like loudness and stress patterns. Intensity is a key acoustic feature that contributes to phonetic distinctions, allowing listeners to identify and differentiate between sounds in communication.

Narrow transcription: Narrow transcription is a method of transcribing speech that captures the precise phonetic details of spoken language, including subtle variations in pronunciation. This level of detail helps linguists analyze the acoustic properties of sounds and understand how they are perceived auditorily. By focusing on individual speech sounds, narrow transcription provides valuable insights into pronunciation patterns and dialectal variations, essential for studying phonetics and phonology.

Pitch tracking: Pitch tracking refers to the ability to monitor and analyze the frequency of sound waves, specifically the fundamental frequency, over time. This process is crucial in understanding speech and musical tones, as it allows for the identification of pitch changes that contribute to meaning and emotional expression in communication. By capturing these variations, pitch tracking enhances our understanding of how different sounds are produced and perceived in both speech and music.

IPA: The International Phonetic Alphabet (IPA) is a system of phonetic notation designed to represent the sounds of spoken language in a consistent and standardized way. By using unique symbols for each sound, it allows linguists to accurately transcribe the pronunciation of words across different languages, facilitating the study of both acoustic and auditory aspects of phonetics, as well as the classification of speech sounds and the distinction between phonemes and allophones.

McGurk Effect: The McGurk Effect is a perceptual phenomenon where visual information influences auditory perception, leading to a mismatch between what is heard and what is seen. This effect demonstrates how the brain integrates both auditory and visual cues during speech perception, highlighting the interaction between sensory modalities in communication. It shows that our understanding of spoken language can be altered by the visual presentation of speech sounds.

Top-down processing: Top-down processing is a cognitive mechanism where perception is driven by cognition, meaning our understanding of information is influenced by our prior knowledge, experiences, and expectations. This type of processing allows individuals to interpret auditory signals based on context and familiarity, enabling them to recognize speech sounds even in noisy environments. It plays a crucial role in how we understand spoken language, as listeners often use contextual clues to fill in gaps when they encounter incomplete or unclear auditory information.

Categorical perception: Categorical perception refers to the phenomenon where the brain categorizes stimuli, particularly in speech sounds, into distinct categories rather than perceiving them as a continuous range of variations. This process is crucial for distinguishing between different phonemes in languages, as it allows listeners to quickly identify sounds that belong to specific categories, such as 'b' versus 'p', even when there are subtle variations in how these sounds are produced. The ability to categorize sounds plays an essential role in both acoustic and auditory phonetics, influencing how we process and understand spoken language.

Source-Filter Theory: Source-filter theory is a model that explains how the human vocal tract produces sound. It describes two main components: the source, which generates the sound (like vocal fold vibrations), and the filter, which shapes the sound by modifying its frequency characteristics as it travels through the vocal tract. This theory is essential for understanding how different speech sounds are produced and perceived, linking the physical production of sounds to their acoustic properties.

Coarticulation: Coarticulation is the phenomenon where the articulation of one speech sound is influenced by the surrounding sounds, leading to overlapping gestures in speech production. This process occurs because the human vocal tract is not designed to create distinct boundaries between sounds; instead, articulators often prepare for upcoming sounds while still producing the current one, resulting in smoother and more efficient speech. Coarticulation affects both how sounds are produced (articulatory aspects) and how they are perceived (acoustic aspects).

Critical Band: A critical band is a frequency range within which multiple sound frequencies interact and can influence auditory perception. This concept is important in understanding how sounds are processed by the ear and how we perceive speech and other auditory signals. It reflects the idea that our hearing is not a precise instrument but is influenced by surrounding sounds, making it crucial in the study of sound perception, particularly in relation to speech sounds and environmental noise.

Spectrogram: A spectrogram is a visual representation of the spectrum of frequencies in a signal as they vary with time. It provides a way to analyze and understand the acoustic properties of speech sounds by displaying frequency on the vertical axis, time on the horizontal axis, and intensity or amplitude through varying colors or shades. This tool is essential in the study of sound because it allows researchers to examine how different sounds are produced and perceived.

Auditory Nerve: The auditory nerve, also known as the cochlear nerve, is a critical component of the auditory system that transmits sound information from the inner ear to the brain. It plays a vital role in converting sound waves into electrical signals that the brain can interpret, allowing for the perception of sound. This process connects the physical properties of sound with the neurological responses that enable hearing.

Waveform: A waveform is a visual representation of the variation in air pressure or electrical signal over time, often displayed as a graph. In acoustic and auditory phonetics, waveforms illustrate how sounds are produced and perceived, providing crucial insights into the physical properties of speech sounds, such as frequency, amplitude, and duration. Understanding waveforms helps in analyzing different speech patterns and their acoustic characteristics.

Oscilloscope: An oscilloscope is an electronic device used to visualize and analyze the waveform of electrical signals, displaying them as graphs on a screen. By providing a real-time representation of sound waves, it becomes an essential tool in acoustic and auditory phonetics for measuring properties such as frequency, amplitude, and waveform shape, enabling a deeper understanding of how sound is produced and perceived.

Auditory threshold: Auditory threshold refers to the minimum level of sound intensity that an individual can detect or perceive. It is crucial in understanding how humans and animals perceive sounds, with factors such as frequency and individual differences affecting this threshold. The concept of auditory threshold is fundamental in both acoustic and auditory phonetics, as it helps to explain the limits of hearing and the range of sounds that can be analyzed in speech and language.

Voice Onset Time: Voice onset time (VOT) is the length of time that elapses between the release of a consonant and the onset of vocal cord vibrations during the production of a sound. This temporal measurement plays a crucial role in distinguishing between voiced and voiceless sounds, which is essential for understanding speech perception and production.

Voicing: Voicing refers to the vibration of the vocal cords during the production of speech sounds. It distinguishes between sounds produced with vocal cord vibration, known as voiced sounds, and those produced without it, termed voiceless sounds. This distinction is crucial for understanding various aspects of phonetics, including how sounds are articulated, perceived acoustically, and represented in writing systems.

Duration: Duration refers to the length of time a sound or phoneme is held during speech. This can affect the meaning of words and how they are perceived by listeners. Understanding duration is essential in acoustic and auditory phonetics because it relates to how different sounds can vary in length, which can change the way they are interpreted in different languages or dialects.

Pitch: Pitch is a perceptual property of sounds that allows us to categorize them as high or low, depending on the frequency of the sound wave. This concept is crucial in understanding how we perceive and differentiate between various phonetic sounds, which are essential for speech and language communication.

Amplitude: Amplitude refers to the maximum extent of a vibration or oscillation, measured from the position of equilibrium. In the context of sound waves, amplitude is directly related to the loudness or intensity of the sound; greater amplitude means a louder sound. Understanding amplitude is essential for analyzing how sounds are produced and perceived in both acoustic and auditory phonetics.

Formant: A formant is a concentrated band of acoustic energy in the speech signal that corresponds to a specific resonant frequency of the vocal tract. Formants play a crucial role in distinguishing different vowel sounds, as each vowel has its own unique formant pattern. Understanding formants is essential for analyzing how sounds are produced and perceived in speech.

Wavelength: Wavelength is the distance between successive peaks (or troughs) of a wave, typically measured in meters. In the context of sound waves, wavelength is crucial because it relates to the frequency of the sound and determines how we perceive pitch. The relationship between wavelength and frequency is inversely proportional: as wavelength increases, frequency decreases, which significantly impacts how sounds are produced and heard.

Frequency: Frequency refers to the number of occurrences of a periodic event in a given time period, usually measured in Hertz (Hz). In the context of sound waves, frequency is crucial for determining the pitch of a sound, which is how we perceive different tones. Higher frequencies correspond to higher pitches, while lower frequencies relate to lower pitches, influencing both acoustic properties and how humans perceive sounds.

Table of Contents

🤌🏽intro to linguistics review

2.2 Acoustic and auditory phonetics

Acoustic Properties of Sound

Properties of sound waves

Speech Production and Perception

Articulatory vs acoustic characteristics

Process of speech perception

Tools for acoustic analysis

Key Terms to Review (25)

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes