How We Hear Speech

Our very brief introduction for lay people to the science of speaking and hearing speech.

talking kids

Our ears are our gateway to the world - and to each other.

Hearing speech is a pretty amazing faculty. It allows us to enter into telepathic communication with fellow human beings by exchanging “invisible vibrations”. Most of us take this completely magical ability for granted and rarely stop to wonder how incredible it is. But imagine you lost the ability to hear the speech of others over night. Can you picture how lonely your world would become...

But enough of the sad stuff, the purpose of this quick guide to hearing speech is to celebrate how wonderful, and also how strange and peculiar, our ability to speak and to hear speech really is. So let's get started.



Let's assume you already know what you want to say. Then, all you need to do is to broadcast your thoughts as sound waves using a suitable “air wave creating device”. Luckily you were born with one of those devices built into your neck. It is known as a larynx. Here you can see a little movie of a larynx in action.

Tell me more
Question: can you work out what changes in the larynx to make some sounds high pitched and others not?

Articulating speech

Of course, just pushing an "aaaaaaah" sound out of your throat is very limited as far as communication goes. To be able to make sounds that others will understand, we "articulate" speech, which involves precise movements of our tongue and jaws and lips in complicated ways. Have a look inside this speaker's head and you will see the intricate dance our throat and mouth perform when we speak:

You even move a piece of your anatomy you may not even know you have, such as the "soft palate" or velum, which sits at the back of the roof of your mouth!

Tell me more

How sensible do you think it is to expect deaf people to "lip read", if so many of the movements that shape speech sounds happen deep inside the speaker's head, and are normally hidden from view?

Visualizing Your Own Speech - the Spectrogram

We've learned so far that speech sounds are made up of different frequencies. A spectrogram is one way to visualize the frequency content of a sound. Try speaking into your computer's microphone to see what your own speech looks like below. Frequencies (along the vertical axis) are colored depending on their intensity.

Adjust the sensitivity a bit. Try speaking slowly (your ears are much faster than your eyes, so if you want to be able to "see" what's going on in sound you often have to slow it down). Try a few vowels: "aaaah iiiih oooh eeeeh". You may also want to try singing a vowel at different pitches.

Can you work out what "aaahs" at different pitches have in common? And what distinguishes "aaahs" from "uuuuhs"?

How can you tell your "Aaahs" from your "Oooohs"?

Hopefully we've learned from the online spectrogram that different speech sounds are made up of different formants, or resonant frequencies. In fact, we can create vowel sounds by making sounds with just two resonant formant peaks. The following demo allows you to make such "artificial vowels" by choosing combinations of formants when you click the coordinate system on the left.

Admittedly, these relatively simple vowels do sound very artificial, but they are easy to recognize and tell apart, and they demonstrate that your ears must recognize patterns of formant peaks in order to process speech. This happens quickly and subconsciously, so that when you listen to someone speak, you are probably just thinking about the meaning of their words without any awareness of the processing of the frequency spectra that goes on in your brain.


1) Sound waves are collected by the outer ear. 2) Little bones (ossicles) focus the sound vibrations onto the cochlea of the middle ear.

3) The basilar membrane inside the cochlea breaks the sound waves up into different frequencies. 4) Tiny hair cells which live on the basilar membrane translate mechanical vibrations into patterns of nerve impulses.

Video credits Med-EL

Frequency Analysis by the Ear

Incoming sounds are decomposed into their frequency components at the basilar membrane. The near ("basal") end of the basilar membrane resonates to high frequencies, and the far ("apical") end resonates to low frequencies. Vibrations at different locations on the basilar membrane therefore reveal what frequencies are present in the sound. This little video shows you how the vibrations might look if the basilar membrane could be "rolled out in front of you".

The vibration patterns of the cochlea may remind you of the spectrogram
which you may have played with earlier in this guide.

Video credits Howard Hughes Medical Institute

Cochlear Hair Cells

Your basilar membrane is lined with tiny hair cells, which are responsible for converting mechanical vibration of the basilar membrane to a chemical signal that stimulates your auditory nerve.

Hair cells in your ear come in two flavours: one row of inner hair cells and three rows of outer hair cells. The inner hair cells are heavily connected to the auditory nerve, and they send sound information to your brain. The job of the outer hair cells, in contrast, is to amplify the sound vibrations by dancing along. Literally! (See for yourself in this little video here.)

Video credit Ashmore Lab

Hair Cells Are Fragile

Hair cells are easily damaged, for example, by listening to extremely loud music. As you grow older, hair cells at the high frequency end of the cochlea also tend to wear out.

The pictures below show electron micrographs of healthy hair cells (left) and hair cells after exposure to excessive noise (right).

In mammals, dead hair cells do not grow back. Damage to hair cells is a leading cause of hearing loss. Take care of your hair cells!

Picture credit House Ear Institute

Hearing Sadly Often Gets Dramatically Worse With Age

As we age, our auditory sensitivity often declines, and the average decline is perhaps a lot more than you might think! Use the buttons below to start or stop a little stand-up comedy video clip, and to select to hear it the way it would sound to an average elderly person some 60, 70, or 80 years old, or someone completely deaf. You can also add "multi-speaker babble" background noise. Hearing the comedian is, unsurprisingly, more difficult in the presence of background noise, but it is even harder with background noise and old age hearing noise.

Background Noise

If you are fairly young and your hearing is good, you may be surprised, even appaled, at how much worse the simulated "elderly hearing" is. Is it really that bad? The answer to that is: it depends.

average hearing loss graph


To understand speech or other auditory information, the brain has to interpret the pattern of nerve impulses sent to it by the ear. As you have seen in earlier parts of this guide, the cochlea of the inner ear carries out a frequency analysis of incoming sounds, which is picked up by the cochlear hair cells and sent to the brain. This establishes a pattern of activity across the nerve fibres of the auditory nerve which is a lot like the spectrogram you saw earlier.

Tonotopy in Brainstem Tonotopy in Cortex

The brain then seems to maintain this spectrographic, or tonotopic representation throughout the early parts of the auditory pathway. This is shown here schematically using drawings of auditory brainstem nuclei based on studies on cats and rodents (left) and best frequency maps from the auditory cortex of a ferret (right). The principle is the same in the brains of all mammals, including yours. The brain thus initially processes sounds frequency band by frequency band. But to understand speech it needs to recognize sound patterns that can span many frequency bands at once. How it does that is still largely unknown, but we do know a little bit about which brain areas are likely to be involved.

Sources Kandler et al. and Nelken et al.


Broca's and Wernicke's Areas We have learnt a lot about which parts of the brain process speech from unfortunate patients who suffered brain injuries such as strokes. In the late 1800s, the French neurologist Paul Broca and the German neurologist Carl Wernicke found that patients with damage to particular parts, typically in the left brain hemisphere, developed difficulties in either producing or understanding speech. The parts of the brains they identified now bear their names, and are shown in the picture here.
In the video on the right you can see an unfortunate patient suffering from Wernicke's aphasia. The patient, a retired dentist, is clearly not deaf and can hear the questions put to him by the interviewer, but the replies he utters are somewhat off topic...

Brain Areas Involved in Understanding Speech: Neurosurgical Data

Brain with Test Sites

Neurosurgeons needing to remove parts of diseased brain tissue sometimes zap a small area of the brain before they remove it to deactivate it temporarily and test whether that brain area is essential for a particular function. The circles on this picture of a brain show areas tested in this manner. The filled circles mark areas which were found to be essential for understanding speech.

Based on a study by Dana Boatman