These lessons introduce you to phonetics and the IPA. If you enjoy this approach, consider my workbook for more practice.

  1. Introduction
  2. Vowels
  3. Consonants
  4. Syllables
  5. Words & phrases
  6. An extra challenge
  7. Answers to the exercises


Symbol-to-sound correspondence

Imagine that you wanted to represent the pronunciation of a language as accurately as possible in writing. This seems like an easy task. After all, isn't that what the invention of the alphabet was all about?

Well, consider English. We write an "a" in change, ball and hand even though that "a" represents a different sound in each word. Some simple sounds are represented by two letters (like "th" in thing) while sometimes a single letter represents two sounds (like "x" in example)!

Maybe a more "phonetic" language, like Spanish, fares better? Initially, it would seem so: "a" sounds the same in avión, sola and amigas. Yet Spanish spelling masks the difference between the "g"s in gringo vs. agua vs. gente. What's more, the way a Cuban leaves off the final "s" in muchos or how a Latin American speaker pronounces "s" and "z" the same account for only a few of the intricacies that hide behind standard Spanish spelling. English and Spanish spelling demonstrate how languages can't even use their own alphabets to encode their sounds accurately.

To tackle the problem, we first need to understand a basic concept missing from our discussion so far: the phoneme. Phonemes are individual units of sound that can be pronounced on their own and considered "one sound". To represent the pronunciation of a language accurately, we can break words and phrases into these individual sounds, these phonemes. Then, if we assign a unique "letter" (or symbol) to every possible phoneme, we could write the pronunciation accurately, without confusion and contradiction.

Steps toward understanding the IPA

Initially, all that may seem like a trivial thought experiment, but it teaches you the thinking behind the International Phonetic Alphabet. The IPA is a tool, really just a set of many symbols, that allows you to display and read any stream of sounds in any natural language. It's a way to "hear with your eyes" and imitate pronunciation more accurately. It rivals our ability to record the human voice in its usefulness to help us understand and analyze the pronunciation of the world's languages.

In this lesson, you will begin to learn about the phonetic concepts that underpin the IPA. Specifically, you will learn about vowels and consonants and how the human mouth produces them. You will learn many of the most common IPA symbols relating to vowels and consonants. Then, you will learn how syllables and larger chunks of speech work, and how to use IPA to represent those. Along the way, you'll have opportunities to work with the pronunciation of words in a variety of languages.

Vowels - their features & IPA symbols

On the rare occasion that you talk about vowels, you probably identify them by their letter names (like the letter "e"). What if you wanted to explain how and why the sound of "e" differs from "a" or "u"? For starters, you might use the concept of a phoneme. You would mention that the phonemes represented by the letter "e" vary from word to word. In words like "scene", it sounds like the "i" in the word "sing". As is custom, let's write that phoneme between slashes: /i/ in "sing" and "scene". Then, you'll need to understand the features that make up /i/ which distinguish it from other phonemes like /a/ or /u/.

Height of your tongue or jaw

These features I mentioned have to do with the way you pronounce vowels using your tongue, mouth and jaw. First, let's consider that the position of your tongue relative to the roof of your mouth when you pronounce a vowel. When you say the vowel in "saw", your tongue is further away from the palate (the roof of your mouth) than when you say "sing". This feature is known as vowel height. Vowels can be classed as high (like /i/ or /u/), mid (like /e/ or /o/) or low (like /a/) depending on the height of your tongue in your mouth.

Height may be thought of as describing your jaw instead. When you pronounce the phoneme /ɑ/ in "father", notice that your mouth is much more open than when you pronounce /u/ in "rune". From this perspective, vowels can be classed as close (again, as /i/ or /u/), mid (again, as /e/ or /o/) or open (again, like /a/). In other words, close and high are synonymous, as are open and low.

How far back you place your tongue

Vowels have another essential feature. You push your tongue forward when you say the /i/ in "sing", but pull your tongue back when pronouncing the /ɑ/ in "father". This feature of vowels is known as backness. Vowels can be labeled front (like /i/ or /e/), central (as /a/) or back (like /o/ or /u/).

Combining features to identify specific vowels

You'll be hard-pressed to name a vowel based on its backness or height alone. If we treat these features as pieces or components of vowels, we can put any two features together and arrive at a vowel. For instance, rather than talking about "the vowel in scene", we can accurately pinpoint the "close front vowel". You may think of these feature sets as two axes on a chart, with backness on the x-axis and height on the y-axis.

front central back
close i, ɪ u, ʊ
mid ɛ, e ə ɔ, o
open æ a ɑ

Recognizing IPA vowels in English & other languages

We can use English words to recognize most of the above phonemes. Watch the included video above to hear the sounds pronounced clearly. Here are some of the cleanest examples easily heard within English words:

IPA symbols example words
/ɪ/ sit
/ɛ/ let
/æ/ math
/ə/ reduction
/ɑ/ father
/ɔ/ ought
/ʊ/ books

The pure vowels /i/, /e/, /a/, /o/ and /u/ are more difficult for most English speakers. They have to be extracted from diphthongs where they're pronounced before a y-sound (after front vowels /i/ and /e/) or w-sound (after the back vowels /o/ and /u/). The phoneme /a/ is the first element of the diphthong /ai/ (in "aisle") and /au/ (as in "out") for many English speakers. It is also found in Australian English, and is the normative "a" sound in all Romance languages. We'll learn more about diphthongs later.

IPA symbols example words notes
/i/ seem leave out the final "y" sound
/e/ say leave out the final "y" sound
/a/ aisle leave out the final "y" sound
/o/ low leave out the final "w" sound
/u/ soon leave out the final "w" sound

By comparison and with a good ear, you can notice the same phonemes in any language you're learning. In languages as different as Japanese and Hawaiian, "i" is pronounced /i/. What's more, you'll be able to imitate unfamiliar vowels. Spanish, French and Italian "a" all sound like /a/ (rarely heard in American English) rather than /ɑ/. French, German and Ancient Greek all have a rounded close front vowel represented by /y/ (sounds like /i/ but pronounced with your lips rounded).

Other features that differentiate vowels

We can consider secondary features. Other features include roundness (you round your lips when you pronounce /u/) and nasalization (French & Portuguese vowels before n/m + consonant are pronounced through the nose; in rapid speech some English speakers say "hand me that" with a nasal /æ/).

Keep in mind that an essential function of vowel features is to represent how speakers of a language distinguish vowels. For instance, if mid-front, mid-central and mid back vowels don't sound different to speakers of language X, then language X does not distinguish backness as a feature of mid vowels.

Ultimately, as you learn more about sound systems in languages, you'll see you can define vowel features more precisely by subtle variations in the quality of the sound produced than by how they are made in the mouth, as we have defined them.

Practice Exercise

1) Identify every vowel phoneme in the words below. First, list the features of each vowel, then write each vowel in IPA. Note that silent e's are silent, or zero-phonemes:

change, ball, hand, ring, after, said, look, true

2) Read these four Italian words written in IPA aloud. Pay close attention to the vowels.

/wɔmini/, /sɛtte/, /porta/, /kultura/

Consonants - their features & IPA symbols

Where you pronounce the consonant

Consonants have features, as well, but not the same features we used to distinguish vowels. First, let's consider where in your mouth you pronounce the consonant sound. If you press your lips together (like "b" in "blip"), you make a labial sound. If instead you press your tongue against your teeth (like "th" in "thin"), you make a dental sound. Against the gum ridge behind your teeth (as "s" in "speech"), and you produce an alveolar sound. Against the roof of your mouth (like "sh" in "ash"), and the sound is palatal. Up against the back of your mouth (like "g" in "grammar"), and you articulate a velar sound. This is called the consonant phoneme's place of articulation, in other words, where you form the phoneme. Dental, alveolar, palatal and velar all describe places of articulation.

How you pronounce the consonant

Then, consider that you can produce different types of sounds at a certain place in your mouth. If, again, you press your two lips together, you can press your lips together very tightly and release a popping /p/ sound. You could, instead, keep your lips lightly together and release a steady flow of air, which sounds not quite like an English /f/ (it's the sound of the Japanese "f" in "furigana"). Both sounds are labial, which describes their place of articulation. But the sharp, popping sound is a stop (also called plosive, from Latin for "beat" or "slap"), and the less restricted, consistent flow of air makes a fricative sound (from Latin for "rub").

Now, let's change the place of articulation by pressing your tongue against the gum ridge behind your upper teeth. If you make a strong plosive/stop sound on your gum ridge, you make a /t/ sound. If you let gentle stream of air pass between your tongue and gums, you pronounce an /s/ sound instead. This feature is known as manner of articulation, in other words, how you form the phoneme.

Whether or not you voice the consonant

The final consonant feature we'll learn about has to do with the vibration of your vocal chords. Notice that when you hum, your throat vibrates, but when you whisper quietly, it doesn't. Your "voicebox", the vocal folds on your larynx, vibrate when you make voiced sounds (like /z/, /d/, and /b/), but stay still when you pronounce voiceless sounds (like /s/, /t/ and /p/). This feature is called voicing. In fact, it's safe to say that the only difference between /s/ and /z/ is the voicing of /z/ (the same holds for /t/ & /d/ or /p/ & /b/), making these phonemes voiced-voiceless pairs.

Combining the three features to identify specific consonants

Like with vowels, we can't identify specific consonants based on voicing, place of articulation or manner of articulation alone. We must treat these features as building blocks of consonants. Certain combinations of the three features produce specific, identifiable consonants. For example, instead of speaking about "the consonant in the word thing", we can accurately describe the "voiceless dental fricative". You can consider the relationship between these features as a chart with three axes, with place of articulation on the x-axis, manner of articulation on the y-axis and voicing as the z-axis. In the table below, relevant sounds are given as "voiceless, voiced" pairs.

labial dental alveolar palatal velar glottal
nasal m n ŋ
plosive p, b t, d k, g ʔ
fricative f, v θ, ð s, z ʃ, ʒ x h
affricate tʃ, dʒ
approximant r j w
lateral l

Recognizing IPA consonants in English & other languages

Most of the sounds in the table above are easily recognizable in everyday English words. Here are some clear examples:

IPA symbols example words
/m/ more
/n/ none
/ŋ/ singing (no actual /g/ sound)
/θ/ thing (voiceless)
/ð/ that (voiced)
/s/ said
/z/ prize
/ʃ/ sharp
/ʒ/ pleasure
/x/ Scottish loch ("hard H" sound)
/h/ hear
/tʃ/ cheek
/dʒ/ jaw
/j/ yes

Notice that the affricates /tʃ/ and /dʒ/ are combinations of two phonemes heard as a single sound. The glottal stop is often described as a catch in the throat (the sound found between "uh-uh", the negative counterpart to "uh-huh), You can hear this glottal stop "catch" before initial vowels in English: "every" /ʔɛvrij/.

English speakers have a tougher time with plosives. Specifically, speakers tend to pronounce the voiceless plosives with a puff of air, really an /h/ sound: "take" starts with a sequence of phonemes like /t/ + /h/ rather than just a bare /t/. That aspiration (h- or aitch-sound) is absent when you pronounce /p/ in speak, /t/ in stay and /k/ in sky, so imitate that sound to pronounce /p/ /t/ /k/ as "pure" voiceless plosives. You shouldn't have any trouble with their voiced counterparts /b/ (be), /d/ (day) and /g/ (guy).

As you further develop your comparison skills and your good ear, you can notice these consonant phonemes in foreign languages. Spanish plosives "p" and "t" sound like /p/ and /t/ (without the puff of air as I described above). Italian "c" sounds like /k/ before /a/, /o/, /u/ but /tʃ/ in front of /e/ and /i/. Japanese has the affricates /ts/ and /dz/, which aren't heard as single sounds in English but can be imitated easily by stringing together /t/ + /s/ and /d/ + /z/. The alveolar tap /ɾ/ of Spanish, Portuguese or Italian /r/ in "caro" is the same sound of American English "later" or "stutter".

More precise definitions and other features

Place and manner of articulation can be pinpointed more scientifically and exactly than I have done. The sounds /ʃ/ and /ʒ/, which I label as palatal, are actually "postalveolar" (slightly above and behind the alveolar ridge). You can deepen your understanding by considering which part of your tongue presses against which part of your mouth - the tip of the tongue is involved in coronal sounds (including dental and alveolar), while the body and back of the tongue articulates dorsal phonemes (including velar). Labial sounds emphasize the lips rather than the tongue (including bilabial sounds like /b/ and labiodental ones like /v/). Radical and glottal sounds are made with the base of your tongue and the back of your throat.

Practice Exercise

1) Identify every consonant phoneme in the words below. First, list the three features of each consonant, then write each consonant in IPA. Pay attention to pronunciation over spelling!

breaching, houses, of, thing, defamation, announce

2) Read these four Brazilian Portuguese words written in IPA aloud. Pay close attention to the consonants.

/tu/, /idadʒi/, /vĩtʃi/, /sɔ/, /xaɾu/


So far, we've studied vowels and consonants separately. We've even broken them down into their component parts called "features". Yet speakers don't tend to pronounce consonant or vowel sounds in isolation, but together. We won't just jump from sounds to words and sentences. We can first organize speech sounds into beats or units. More specifically, speakers of all languages put vowels and consonants together into speech units known as syllables.

Structure and types of syllables

Syllables tend to be built around a vowel. That vowel is the heart of the syllable, called its nucleus. We may abbreviate vowel as V (V stands for any vowel).

Clearly, people don't just speak in vowels. Consonants surround the nucleus. A syllable may have consonants before the nucleus and consonants after the nucleus. As with vowels, we may abbreviate consonant as C. Consonants before the vowel are part of the syllable's onset. Consonants after the vowel form the syllable's coda (Italian for "tail").

The English syllable "dab" (which also happens to be a word) has one consonant sound before the vowel and one after: /dæb/. So, the structure of that syllable is CVC. But that's not the only syllable type:

structure English examples IPA notes
V ah! /ɑ/ no consonants in onset or coda
(CC)CV nah! (for "no") /næ/ consonants in onset, none in coda
VC(CC) acts /ækts/ consonants in coda, none in onset
(CC)CVC(CC) splints /splɪnts/ consonants in onset and coda

Now, other languages have different constraints on their syllables. Specifically, English can "overload" on consonants, while languages often allow only much simpler syllables. In standard Japanese, there are only four possible syllable types:

structure Japanese examples IPA notes
V ō /o:/ no consonant in onset or coda
CV do /do/ consonant in onset + short vowel
CV: /do:/ consonant in onset + long vowel
CVn dan /dan/ consonant in onset, /n/ in coda

Complex vowels and consonants

Up to this point, we haven't really considered more complicated vowel combinations. Diphthongs involve a vowel & vowel or vowel & glide in the nucleus of a syllable (glides include the "y-sound" /j/ and the "w-sound" /w/). What English speakers call "long vowels" also represent diphthongs under the guise of a single letter: "state" /stejt/, "time" /thajm/ and "rune" /ruwn/ versus "stat" /stæt/, "thin" /θɪn/ and "run" /rən/.

"True" long vowel phonemes, on the other hand, involve holding the nucleus' vowel sound out for a longer amount of time (/a:/ and /e:/ are roughly twice as long as /a/ and /e/, but have the same quality). English speakers may hold vowels longer in interjections like "aaah!" (/ɑ:/ or even /ɑ::/), but we don't use length to distinguish vowels. Japanese speakers, on the other hand, do: /to/ is distinct from /to:/ in Japanese.

Long consonants may be thought of as doubled or geminate (Latin for "twinned"). While English speakers tend to write double consonants in spelling, we almost always pronounce them as single, which comes across in the IPA: "bigger" /bɪgər/. In some languages, like Italian, double consonants are held out longer, and this is represented both in spelling and in IPA: bellissima /bellissima/. English speakers, in fact, may do this with certain consonants when the last sound of one word matches the first sound of the next: "thick crust" /θɪkkhrəst/.

Combining syllables

Once we've grasped the basics of syllables, the next step is to combine them, to string them together. How else can we account for long stretches of continuous speech?

It's important to keep syllable structure in mind, since syllable divisions are based on the syllable structure of a language. We may even represent syllable divisions in IPA with a low dot or period between syllables: "hand over" /hæn.dow.vər/, "astronomical" /æs.trə.nɑ.mɪ.kəl/.

Practice Exercise

1) Take a look at the list of words and phrases below. Transliterate each word or phrase into IPA in a way that represents your pronunciation. Refer to the consonant and vowel sections above as needed.

hint, language, reading, learned, he says, uh-oh, uneven

2) Represent each word or phrase above as a string of consonants (C) and vowels (V).

3) Finally, break the IPA strings into syllables the way I represented "hand over" as /hæn.dow.vər/.

Words, phrases & sentences

We've now covered the fundamental aspects of IPA you'll need to analyze and understand individual sounds and syllables. However, speech doesn't stop at the level of sounds and syllables.

Utterances and streams of sound

If you record yourself speaking with a computer program that allows you to view the amplitude and frequency of the sound waves, and you'll notice that you don't speak in words or even sentences the way you write. You might pause between phrases and sometimes even in the middle of words, while your voice flows right past expected word and sentence breaks. When talking about sound and pronunciation, utterance becomes a more fundamental concept, since it stands for a single stream of speech sounds.

A few thoughts arise from this "streaming" concept of sounds. First, you will notice that speakers of all languages don't distinguish invididual, distinct words in rapid speech. It follows that we don't use conventions like spaces between words, punctuation or capital letters without good reason. We may use a period to separate syllables or spaces to help distinguish words when reading the IPA, but neither of these is essential. Still, in the end, "we could write this sentence like this": /wijkhʊdrajtðɪssɛntənslajkðɪs/.

Second, the barriers between sounds must be somewhat blurred. This second point explains why sounds assimilate to (become more like) nearby sounds:

phoneme English examples IPA notes
/n/ onto ntu/ /n/ and /t/ are both alveolar
/n/ inbox mbɑks/ /n/ turns labial before adjacent /b/

This brings up a concept I plan to tackle in the next series (introduction to phonology). In the examples above, we don't say that the /n/ has split into two new phonemes, one /n/ and the other /m/. Instead, we point to the concept of an allophone. Allophones account for the various ways the "same sound" (as heard by speakers) shows up in speech: /n/ is pronounced [n] in the word "into" but sometimes comes out as [m] in the word "input". In this way, we can propose [n] and [m] as two allophones of one underlying phoneme /n/.

Features of longer streams of sound

We haven't yet considered the kinds of features that might apply to longer speech - to words, phrases and utterances. At this point, we can only transcribe strings of individual sounds, but those would be fairly monotonous if read aloud.

One very important feature is accent. One type of accent, called stress, involves emphasizing one or more syllables by speaking them louder than surrounding syllables. In many languages, like English and German, words normally have a fixed stressed syllable (consider "ho-tel" versus "host-tel"). In IPA, stress can be written with an apostrophe before the stressed syllable: /how.'thɛl/ versus /'hɑs.təl/. But this also holds when we look at longer utterances: /'wijkhʊd'rajtðɪs'sɛntənslajk'ðɪs/.

In Japanese, stress doesn't play this kind of role. Instead, Japanese speakers raise and lower the pitch, or tone. This is called pitch accent or intonation. "Kyōto" /kjo:to/ starts with high, falling pitch in the first syllable, and ends in a lower pitch in the second syllable. In IPA, pitch may be indicated by accent marks (rising pitch /á/, high/stable /ō/ or falling /ò/), up/down arrows, sloped arrows or hooked bars.

Sometimes features don't work on the level of words but on the level of sentences and "complete thoughts". Features at this level are collectively called prosody. English speakers don't use intonation to distinguish syllables, but to perform larger functions like distinguishing a question from an exclamation from a simple declaration: "Are you happy?" /↘'ɑr juw ↘'hæ↗phij/ versus "You are happy." /↗juw ↘'ɑr ↗'hæ↘phij/. Stress may also play a prosodic function: "I didn't learn anything" versus "I didn't learn anything".

Practice Exercise

1) Oops! As I transcribed these phrases into IPA, I forgot to assimilate certain consonants. Look at each phrase, consider how it sounds in flowing speech, and correct expected assimilations.

/'khɑnkwɛst/, /ðɪs'ʃɪp/, /'ðɪsɪz'jərz/, /ɪn'phərsən/, /bowθ'sɛts/

2) Two English speakers read the same phrase differently. I captured their pronunciation below in IPA. First, identify and place the missing features of stress accent and intonation. Second, spell out the original phrase these speakers read in standard written English. Third, based on your judgment, what kind of speaker said a)? What kind of speaker said b)?

a) /ɪntənæʃənəlfənethɪkhælfəbɛth/

b) /ɪnərnæʃənəlfənɛɾɪkhælfəbɛʔ/

3) Listen carefully to these sound files of Italian words and phrases. Transcribe them as best you can into IPA.

An extra challenge

Before you run off to brag to your friends that you've completed this course on IPA, I leave you with a continuing challenge.

On this site, I have a number of "learn to pronounce language X" or "language X pronunciation" guides with explanations and audio files. I challenge you to complete one of those pages and transcribe every word you hear into IPA as accurately as possible. If you need help, look up "X language phonology" in Wikipedia - that resource usually has all the IPA you'll need to start transcribing a specific language.

But don't stop there. I recommend that you start employing your IPA skills in any situation relevant to language learning or pronunciation. Using the foundation you have gained here coupled with curiosity and vigor for spoken language, you will master the phonetic alphabet in no time.

Answers to the practice exercises

Practice Exercise 1 (Vowels)

1) ch/e/nge (mid front), b/ɑ/ll (open back), h/æ/nd (open front), r/i/ng (close front), /æ/ft/ə/r (open front & mid central), s/ɛ/d (mid front), l/ʊ/k (close back), tr/u/ (close back)

2) /wɔmini/, /sɛtte/, /porta/, /kultura/

Practice Exercise 2 (Consonants)

1) /br/ ea /tʃ/ i /ŋ/ (voiced bilabial plosive, voiced alveolar approximate, voiceless palatal/postalveolar affricate, voiced velar nasal)

/h/ ou /z/ e /z/ (voiceless glottal fricative, voiced alveolar fricative, voiced alveolar fricative)

o /v/ (voiced labiodental fricative)

/θ/ i /ŋ/ (voiceless dental fricative, voiced velar nasal)

/d/ e /f/ a /m/ a /ʃ/ io /n/ (voiced alveolar plosive, voiceless labiodental fricative, voiced bilabial nasal, voiceless postalveolar fricative, voiced alveolar nasal)

a /n/ ou /ns/ e (voiced alveolar nasal, voiced alveolar nasal, voiceless alveolar fricative)

2) /tu/, /idadʒi/, /vĩtʃi/, /sɔ/, /xaɾu/

Practice Exercise 3 (Syllables)

1) /hɪnt(h)/, /lejŋgwədʒ/, /rijdiŋ/, Amer. /lərnd/ or Brit. /lə:nt(h)/, /hijsɛz/, /ʔəʔow/, /ʔənijvən/


3) /hɪnth/, /lejŋ.gwədʒ/, /rij.diŋ/, Amer. /lərnd/ or Brit. /lə:nt(h)/, /hij.sɛz/, /ʔə.ʔow/, /ʔə.nij.vən/

Practice Exercise 4 (Words & utterances)

1) /'khɑŋkwɛst/, /ðɪʃ'ʃɪp/, /'ðɪsɪ'ʒərz/, /ɪm'phərsən/, /bows'sɛts/. (Whether you perform every one of these assimilations in English will depend on your dialect, but the first one, /nk/ > /ŋk/, is fairly universal.)

2) First: /ɪntə↘'næʃənəlfə↗'nethɪkh↗'æl↘fəbɛth/ and /ɪnər'næʃənəlfə↗'nɛɾɪkh↘'ælfəbɛʔ/

Second: "International Phonetic Alphabet"

Third: Speaker a) sounds like a British English speaker, while speaker b) sounds like an American English speaker.

3) /pjuk'kaɾo/, /an'napoli/, /'skeɾtso/