Room acoustic parameters are the measurable quantities that describe how sound behaves in an enclosed space. They give architects and acousticians a shared vocabulary for evaluating whether a room sounds good for its intended purpose, from speech clarity in a classroom to musical richness in a concert hall.

This section covers the major parameters: reverberation time, early decay time, clarity, definition, sound strength, lateral energy fraction, interaural cross-correlation, bass ratio, and stage support.

Reverberation time (RT)

Reverberation time quantifies how long sound lingers in a room after the source stops. Specifically, RT is the time it takes for the sound pressure level to drop by 60 dB. This single number has an outsized influence on perceived acoustics, speech intelligibility, and musical clarity.

Sabine's equation for RT

Wallace Sabine developed the most widely used formula for estimating RT:

$RT = \frac{0.161V}{A}$

where $V$ is the room volume in cubic meters and $A$ is the total absorption in sabins (square meters of equivalent perfect absorption). The constant 0.161 applies when using metric units.

Sabine's equation assumes two things: a diffuse sound field (sound energy is roughly equal everywhere) and evenly distributed absorption. These assumptions hold reasonably well in many rooms, making the equation a practical first-pass tool. It becomes less accurate in rooms with very high absorption, where the Eyring equation is more appropriate.

Factors affecting RT

Several factors control a room's reverberation time:

Room volume: Larger volumes mean more air for sound to travel through before hitting a surface, so RT increases with volume.
Absorption: Materials like acoustic panels, heavy curtains, and carpet absorb sound energy and shorten RT. Hard surfaces like concrete and glass reflect sound and lengthen it.
Room geometry: Irregular shapes and diffusing surfaces scatter sound more evenly, which doesn't necessarily change RT but affects how uniform the decay sounds throughout the space.

Optimal RT ranges

The right RT depends entirely on what the room is for:

Speech spaces (classrooms, lecture halls): 0.5 to 1.0 seconds. Short RT keeps consonants crisp and prevents syllables from blurring together.
Concert halls: 1.5 to 2.5 seconds. Longer RT adds richness, warmth, and a sense of the notes blending musically.
Multi-purpose auditoriums: 1.2 to 1.8 seconds. A compromise that works acceptably for both speech and music, though ideal for neither.

Early decay time (EDT)

Early decay time measures the initial rate of sound decay rather than the full 60 dB drop. EDT is defined as the time for the sound pressure level to fall by 10 dB, then multiplied by 6 to extrapolate to a 60 dB equivalent. This makes it directly comparable to RT in units (seconds), but it captures something different about the room's character.

EDT vs RT

RT describes the overall decay behavior. EDT zooms in on the first 10 dB of that decay, which is dominated by early reflections and the direct sound. Because human hearing is most sensitive to this initial portion, EDT often predicts how reverberant a room feels better than RT does.

In a room with uniform absorption, EDT and RT will be similar. They diverge in rooms with non-uniform absorption or strong early reflections, which is why measuring both is valuable.

Perceived reverberance and EDT

Rooms with longer EDT values feel more reverberant and spacious, even if the full RT is moderate. For a natural-sounding room, EDT should be fairly consistent across frequency bands. If EDT varies wildly between low and high frequencies, the reverberance will sound uneven or colored.

Measuring EDT

EDT is measured from the room's impulse response:

Excite the room with an impulsive source (starter pistol, balloon pop, or a swept sine signal processed into an impulse response).
Record the decay curve with a microphone.
Fit a line to the decay curve between 0 dB and -10 dB relative to the initial level.
Multiply the time for that 10 dB drop by 6 to get EDT.
Repeat at multiple listener positions and average the results.

Clarity (C50/C80)

Clarity quantifies how well individual sounds can be distinguished from one another in a room. It compares the amount of early sound energy (which reinforces the direct sound) to late sound energy (which blurs it). Higher clarity values mean the early energy dominates, and sounds are more distinct.

Definition of clarity

Clarity is expressed as a logarithmic ratio in decibels:

$C_t = 10 \log_{10} \left( \frac{\int_0^t p^2(\tau) \, d\tau}{\int_t^\infty p^2(\tau) \, d\tau} \right)$

where $p(\tau)$ is the impulse response and $t$ is the time boundary (50 ms for speech, 80 ms for music). A positive value means more energy arrives early; a negative value means late energy dominates.

C50 for speech intelligibility

C50 uses a 50 ms boundary because speech perception depends on very early reflections reinforcing consonant sounds.

Higher C50 values = better speech intelligibility.
A C50 of 0 dB or higher is generally considered adequate for speech.
At 0 dB, early and late energy are equal. Positive values mean the early energy wins out.

C80 for musical clarity

C80 uses an 80 ms boundary because musical sounds are longer than speech syllables, and reflections arriving within 80 ms still fuse perceptually with the direct sound.

Higher C80 values = more distinct musical lines, but too high means the room sounds dry.
The acceptable range is typically -2 dB to +4 dB, depending on musical style. Romantic orchestral music benefits from lower C80 (more blending), while chamber music benefits from higher C80 (more detail).

Sabine's equation for RT, Audit of Auditoriums in an Academic Setting, Ghana

Measuring clarity

Generate an impulse response in the room using an omnidirectional source.
Record the impulse response at the listener position.
Square the impulse response and integrate over the early interval (0 to 50 ms or 0 to 80 ms) and the late interval (beyond the boundary).
Take the logarithmic ratio of early to late energy.
Repeat at multiple positions and average.

Definition (D50)

Definition (D50) is closely related to C50 but expressed as a percentage rather than decibels. It represents the fraction of total sound energy that arrives within the first 50 ms.

D50 and speech intelligibility

$D50 = \frac{\text{energy in first 50 ms}}{\text{total energy}} \times 100\%$

A D50 of 50% means half the energy arrives early and half arrives late. Higher values mean more of the total energy is useful for speech perception. A D50 of 50% or higher is generally considered adequate for speech spaces.

Relationship between D50 and C50

D50 and C50 contain the same information in different forms. They're related by:

$D50 = \frac{1}{1 + 10^{-C50/10}}$

When C50 = 0 dB, D50 = 50%. This makes sense: 0 dB of clarity means early and late energy are equal, so exactly half the total energy is early.

Ideal D50 values

Target D50 values depend on the space and its noise environment:

Classrooms and lecture halls: 50% or higher
Recording studios and teleconferencing rooms: 60% or higher, where precise speech clarity is critical
Noisy environments (open-plan offices, restaurants): 70% or higher, because background noise masks the late energy that would otherwise support intelligibility

Sound strength (G)

Sound strength quantifies how much a room amplifies sound compared to an open-air reference. G is the difference in dB between the sound pressure level measured in the room and the level the same source would produce in a free field at 10 meters distance.

A room always adds energy through reflections, so G is typically positive. It tells you how much "gain" the room provides.

Measuring sound strength

Use an omnidirectional source with known sound power output.
Record the impulse response at various listener positions with a calibrated microphone.
Compare the total energy in the measured impulse response to the theoretical free-field level at 10 m from the same source.
Average G values across multiple positions for a representative result.

Factors influencing G

Room volume: Larger rooms spread energy over more space, lowering G.
Surface reflectivity: Hard, reflective surfaces redirect more energy toward listeners, raising G.
Source directivity: Directional sources (like many musical instruments) concentrate energy, producing higher G in certain directions.

Optimal G ranges

Concert halls: 4 to 8 dB. Provides a satisfying sense of loudness and envelopment.
Speech spaces (classrooms, lecture halls): 0 to 4 dB. Enough support for comfortable listening without excessive loudness.
Critical listening rooms (recording studios, control rooms): -2 to 2 dB. Minimal room coloration for accurate monitoring.

Lateral energy fraction (LF)

Lateral energy fraction measures how much of the early sound energy arrives from the sides rather than from the front. It's the primary parameter for spatial impression, the feeling of being immersed in sound rather than listening to a point source.

LF and spatial impression

Sound arriving from lateral directions (the sides) creates different signals at your two ears, which your brain interprets as spaciousness. LF is defined as the ratio of early lateral energy to total early energy, expressed as a fraction or percentage.

Higher LF = stronger sense of spaciousness and envelopment.
LF values above 0.20 (20%) are generally desirable in concert halls.
Narrow rooms with reflective side walls tend to produce higher LF values.

Sabine's equation for RT, Acoustics - Wikipedia

Early lateral energy (LFC)

LFC (lateral fraction, cosine-weighted) is a refined version of LF that weights the lateral energy by the cosine of the arrival angle. It focuses on the early lateral reflections within the first 80 ms and is considered a better predictor of apparent source width, the perceived broadening of the sound source.

LFC values above 0.35 (35%) are associated with a strong sense of envelopment and a wide apparent source width.

Measuring LF and LFC

LF and LFC require a specialized measurement setup:

Place a figure-of-eight microphone at the listener position, oriented with its null axis pointing toward the source. This microphone rejects frontal sound and captures primarily lateral energy.
Place an omnidirectional microphone at the same position to capture total energy.
Fire an impulse and record both signals simultaneously.
For LF, integrate the squared figure-of-eight signal over 5 to 80 ms and divide by the squared omnidirectional signal over 0 to 80 ms.
For LFC, the integration window for the lateral signal is 0 to 80 ms, and the figure-of-eight signal is cosine-weighted.
Average across multiple positions.

Interaural cross-correlation (IACC)

IACC quantifies how similar the sound signals are at your left and right ears. It's a binaural parameter, meaning it accounts for the fact that you hear with two ears, not one.

IACC and spaciousness

When the signals at both ears are nearly identical (IACC close to 1), the sound image feels narrow and focused. When the signals differ significantly (IACC close to 0), the sound feels spacious and enveloping.

Lower IACC = more spaciousness. This is the opposite direction from most parameters, where higher is "better."
IACC values below 0.4 are generally desirable for concert hall spaciousness.

IACCE (early) and IACCL (late)

IACC is split into two time windows:

IACCE (early, 0 to 80 ms): Related to the perceived width of the source and early spatial impression. Values below 0.6 are desirable.
IACCL (late, beyond 80 ms): Related to the sense of envelopment from the reverberant field. Values below 0.4 are desirable.

Together, these two components describe both the width of the direct/early sound image and the immersiveness of the late reverberation.

Measuring IACC

Place a dummy head (or binaural microphone set) at the listener position, with microphones in the ear canal positions.
Fire an impulse and record the left and right ear signals simultaneously.
Compute the normalized cross-correlation function between the two signals over a range of time delays corresponding to the human interaural time difference (approximately $\pm 1$ ms).
The IACC is the maximum absolute value of this cross-correlation function.
Calculate separately for early and late time windows to get IACCE and IACCL.
Average across multiple positions.

Bass ratio (BR)

Bass ratio describes the tonal balance between low and mid frequencies in a room's reverberation. It's calculated from reverberation times, not from a single impulse response energy measurement.

BR and warmth

Warmth in a room's sound comes from low frequencies reverberating slightly longer than mid frequencies. BR captures this directly:

BR > 1 means low frequencies decay more slowly than mids, producing a warm, full sound.
BR < 1 means low frequencies decay faster, producing a thinner, leaner sound.
BR values between 1.1 and 1.45 are considered optimal for concert halls.

Calculating BR

$BR = \frac{RT_{125} + RT_{250}}{RT_{500} + RT_{1000}}$

where $RT_{125}$ , $RT_{250}$ , $RT_{500}$ , and $RT_{1000}$ are the reverberation times measured in the 125 Hz, 250 Hz, 500 Hz, and 1000 Hz octave bands, respectively.

Steps:

Measure RT in each of the four octave bands.
Sum the two low-frequency RTs (125 Hz + 250 Hz).
Sum the two mid-frequency RTs (500 Hz + 1000 Hz).
Divide the low-frequency sum by the mid-frequency sum.

Ideal BR values

Concert halls and recital rooms: 1.1 to 1.45. Warm and full without muddying the sound.
Speech spaces (lecture halls, conference rooms): 0.9 to 1.1. A flatter frequency balance keeps speech clear.
BR above 1.5: Risk of a boomy, muddy sound where bass overwhelms detail.
BR below 0.9: The room may sound thin and lacking body.

These are guidelines. The right BR also depends on the room's specific design, the repertoire, and listener expectations.

Stage support (ST)

Stage support quantifies the acoustic feedback that performers receive from the room while playing. It measures how much reflected energy returns to the stage compared to the direct sound energy radiated by the performer.

Good stage support helps musicians hear themselves and each other, which is essential for ensemble playing, intonation, and musical expression. Without adequate stage support, performers feel isolated and struggle to blend with one another.

ST1 (early support) measures reflections returning to the stage within 20 to 100 ms. These early reflections help performers hear their own sound and coordinate timing with nearby musicians. ST1 values between -24 dB and -12 dB are typical targets for concert hall stages.
ST2 (late support) measures reflections arriving after 100 ms, giving performers a sense of the hall's reverberance from the stage position.

Stage support is measured by placing both the source and microphone on the stage (typically 1 m apart), firing an impulse, and comparing the reflected energy in the specified time window to the direct sound energy in the first 10 ms.

2,589 studying →