Language relatedness

Historical Linguistics as a Way to Understand Foreign Languages

In previous lessons you learned linguistic techniques for analyzing pronunciation and grammar in a language. Meanwhile, we sidelined questions about the languages themselves: Where do languages come from? What's the relationship between one language and another? What's the real-world situation on the ground?

As you may know, thousands of languages are spoken around the world. Depending on how we count, including how we separate "languages" and "dialects", we have uncovered something on the order of 6,000-7,000 distinct human languages.

These different languages don't differ from each other randomly. Instead, some are related to others. Need an example? Use a free online translator (like Google Translate), type in a couple sentences, and translate that same text into Spanish, then into Portuguese, then into Galician, and finally into Vietnamese. Looking for a moment at those sentences, you can see that some languages are more alike than others. We'll learn how and why later.

Also, we can see clearly that languages necessarily change over time. Language change defines and solidifies the relationships mentioned above. Let's spy on a single language spoken in the same spot on the globe over the past 1,000 years:

1000+ yrs ago Urne gedæghwamlican hlaf syle us todæg.
500+ yrs ago Yeue us today ure eche dayes bred.
modern Give us this day our daily bread.

You see, even cherished and oft repeated phrases in a single language show evidence of that language changing over time. Again, we'll learn the why and the how below.

In this course, we look at language relationships and language change through history by studying words. Specifically, we focus on root morphemes in a variety of languages. As we do this, you will see how the historical method uses linguistic tools introduced in previous lessons to decipher the geographical and chronological story of languages and language families.

What makes a language a language?

In the introduction I told you that humans around the world speak some 6 or 7,000 languages. Before we consider how languages are related and how they change, what is a single language?

A straightforward way to divide languages may be called the "common sense" approach: ask two people if they speak the same language. This opinionated approach works so long as the speakers (and any third parties classifying the speakers) don't disagree. We might add a further qualification - speakers of a same language should be able to understand each other. This understanding among speakers is known as mutual intelligibility. Perhaps our current assumptions about single languages are founded in the twin approaches of common sense (taking speech communities' word on it) and mutual intelligibility.

Imagine that we visit two distinct regions whose residents claim to speak and understand the same language. If speakers in those two regions speak differently, but claim to share the same language, we say that they speak different dialects of that language. Since dialects may or may not be mutually intelligible, they lead us into a very shady grey area when it comes to classification. What's more, when we look at individual speakers in just one of those two regions, we find that people speak differently with superiors than they do with children. Their speech patterns differ when you hear them speak in the home, office or very formal occasions. These social speech differences within the same dialect are called registers. Finally, if we break down language groups even more, we hear that individuals differ among themselves - individual speakers have their own idiolects.

Ultimately, we could view any language as a collection of disparate idiolects. Some form of group identifier then helps us to think of the language as a whole, perhaps as simple as having a name for the language. This identification with identity makes languages inextricable from society and history. This is summed up in the ubiquitous quote about languages, which jests that a language is merely "a dialect with an army and a navy".

Even after much examination, many linguists mimic Socrates at the end of an inquiry - the concept of "individual languages" sounds useful, but we haven't reached a satisfying universal definition of what constitutes a single language.

How are languages related?

The idea that some languages must be related took off at the end of the 1700's, when one William Jones noticed that Sanskrit, the classical language of India, shared a lot in common with Greek, Latin and a number of European languages. It was determined that these languages all belong to a single language family. In the following centuries, scholars mapped out numerous other language families, large and small.

A language family shows genetic relationships like a human family does. A single parent language branches into distinct languages over time through the process of language change, leaving a number of daughter languages. Those daughter languages in turn become parents, branching off into smaller families of their own. The result is a family tree with groups of closely related languages and their more distant relatives, all of which bear relation to the original parent language.

How do we know about language change? As we will learn below, we start by looking for material common to both the daughter languages and the parent language. In humans that shared genetic material is DNA, while in language it is shared words, pronunciation and grammar. Any words and structures inherited by the daughter languages from their parent are called cognates.

Cognates show both commonality and change. Change involves change from and change to, which is easiest to capture when we look at individual words. For example, the Old English phrase at the beginning of this course uses the word hlaf, meaning 'bread'. That word changed from hlaf to laf to our modern word loaf. In that instance, we have both the parent and the daughter words available in writing, so we can compare the "changed from" and the "changed to".

The Gothic word for 'bread' was hlaif, while the Slavic languages have words like hleb and khlep. Since hlaf, hlaif and hleb are related, they must all come from some earlier word. What is that word? Later in this lesson, we will learn how to determine an original word (the "changed from") even if the parent language was not preserved in writing. First, let's move from the abstract to the practical, as I show you how to compare individual words in languages in order to determine relatedness.

Comparing words & languages

Simply put, you will use the comparative method to compare languages. Specifically, you will start by comparing individual words in multiple languages to determine how to classify them: based on these words, do the languages belong to the same family? If so, do they belong to the same branch of that family, or to different branches?

Comparing basic vocabulary

You won't compare words at random. Instead, you will look for basic, frequently used words learned early on by speakers of a language. Let's call this set of words the basic core vocabulary of that language. The idea is that, because of their basic & useful meaning, these words stick around longer in a language (they're less likely to be replaced). Restricting our scrutiny to this hearty stock of words should make cognates both more likely and easier to spot, especially among closely related languages.

While exact lists vary, the basic core vocabulary includes kinship terms (mother, father...), numbers 1-10, ambient terms (sky, sun, earth...), pronouns, body parts, and so on. One prominent example of such a list is the Swadesh List, along with its simplified counterpart, the Yakhontov list.

Let's see what we can make of a selection of basic words from five languages, labeled A, B, C, D & E. Each row of words shares the same meaning - the words differ in form from language to language. It won't matter whether we know how or even if the languages are related - that's exactly what we'll use the comparative method to discover.

A B C D E
quen ko chi quem who
dous rua due dois two
tres toru tre três three
nai whaea madre mãe mother
muller wahine donna mulher woman
waewae piede foot
cavalo hoiho cavallo cavalo horse
peixe ika pesce peixe fish
tu koe tu tu you
pai matua padre pai father
nome ingoa nome nome name
noite notte noite night

Practice Exercise 1

You will find answers to this and every activity in the answers to the exercises at the bottom of this page.

1) Copy or rewrite the list of words above.

2) Circle (or bold) any words in each row that are obviously similar.

3) Next, underline any words in each row that seem to resemble the words you circled, but whose similarities are less obvious.

4) Finally, reorder the languages from most to least similar: language A, language B, language C, language D, language E.

Why words look similar

Congratulations! You've just done your first bit of comparative analysis. You noticed a number of resemblances in the list of basic vocabulary words taken from five different languages. What do those similarities mean?

First, words can be similar by chance if the words just happen to sound similar and have similar meanings. For instance, the words in language B rarely resemble words in languages A, C, D & E. However, when it comes to the word for "three", language B has the word toru, which does resemble tres, tre, etc. Taken against the other words in the list, this resemblance is likely due to chance.

Short words and ad hoc comparisons are especially prone to chance resemblances. Later in this lesson, we'll learn a better way to determine if two words just happen to sound alike by examining sound correspondences.

Words can also be similar due to borrowing. When one language comes in contact with another, speakers may borrow words, often to express unfamiliar concepts. For example, language B hoiho and language E horse look somewhat similar. It turns out that language B borrowed the word horse from language E and rendered it as hoiho.

Borrowed words and chance resemblances can't establish language relationships. What's left? We need to look for legitimate cognates shared by languages within the same family.

Languages A and D consistently resemble each other. The relationship looks genuine, without the unevenness we'd find from borrowing or chance. All words in A have cognates in D. Languages A and D are closely related. The words in C less clearly resemble A and D, but we still see many cognates. Language C is more distantly related to A and D.

Languages B and E don't obviously resemble the other languages. Still, look at words for "night", "name", "mother" and even "two" and "three" - language E isn't too far off from the rest of the group (A, C, D). Also, words for "foot", "fish", "father" start with an /f/ in E and a /p/ in A,C,D, and have a similar shape in both C and E. Could language E be a distant relative of A,C,D? We need more evidence.

Finally, language B is highly inconsistent with respect to every other language on the list. It's hard to say definitively based on so few words, but it looks like B is unrelated to any of the other languages.

So, what are these languages? Language A is Galician, spoken in the northwest of Spain. Language B is Maori, indigenous to New Zealand. Language C is Italian. D is Portuguese, spoken just south of Galician. I'm sure you recognize language E - it's English.

Practice Exercise 2

You will find answers to this and every activity in the answers to the exercises at the bottom of this page.

Japanese has two native sets of numerals: one set of Chinese numbers, which we'll call "set 1", and one of native Japanese numbers, which we'll call "set 2". On top of that, English numbers have been nativized for popular use, so we'll call that "set 3".

Sino-Japanese (set 1) Japanese numbers (set 2) "Japanese English" (set 3)
ichi hitotsu wan
ni futatsu tsu
san mittsu surii
shi yottsu fo

Using the terms you just learned, how can you explain the difference between set 1, set 2 and set 3?

Recovering language family trees

We worked with a list of basic vocabulary words from five different languages in order to establish plausible relationships between those languages. In the end, we posited that three of the languages are almost certainly related, and came up with the possibility that a fourth may be related, as well.

However, our understanding of those relationships is still fairly flat. Families imply history, and languages are related when they descend from some common ancestor. Languages that are more closely related share a closer common ancestor. Think of languages A (Galician) and D (Portuguese) above. They share a fairly close parent, Galaico-Portuguese, which dates to the Middle Ages.

img of simple Galaico-Portuguese family tree

Languages that are more distant relatives share a more distant common ancestor. Language C (Italian) was shown to be a more distant relative of Galician and Portuguese. Since that's the case, it can't fall within the Galaico-Portuguese family - it must fall outside that subfamily. Galaico-Portuguese and Italian are part of a larger family. Their common parent is popular Latin (Vulgar Latin), spoken in ancient times.

img of Italian + Galician + Portuguese family tree

This doesn't mean we have a complete family tree. It turns out that other languages besides Portuguese, Galician and Italian are daughters of Latin.

English (our "language E" above) belongs to its own close-knit family, known as Germanic. Germanic and Latin languages, in turn, belong to an even larger family known as Indo-European.

Recovering an unattested common ancestor

Notice that these distant ancestor languages are far removed from modern languages spoken around us. In the case of Proto-Indo-European, we've reached an earliest common ancestor that isn't written down, or attested, anywhere in the historical record.

To understand the way a lost parent language works, we have to look more closely at its daughter languages. Specifically, we will keep our eye on retentions - cognate words and structures inherited from the earliest common ancestor, and weed out innovations that developed along the way in individual daughter languages. This is easier to achieve when we look at languages closer to the parent, which have had less time to develop innovations. For example, older forms of Spanish, French and Italian are more helpful for reconstructing Vulgar Latin than modern Spanish, French and Italian. Latin, Ancient Greek and Classical Sanskrit are more useful than German, Portuguese and Hindi for reconstructing Proto-Indo-European, even though all these languages fall in the same large family.

We will learn how to compare individual sounds in daughter languages and, ultimately, reconstruct words in parent languages that have been lost to time. To do all this, you will need to stick around and learn something about sound correspondences.

Reconstruction of lost languages

We've spent time talking about the Indo-European languages, but I'd like to give you a breath of fresh air. You'll work with some non-Indo-European languages related to "language B" above (Maori) as you learn to compare sounds and reconstruct words in the parent language.

Since we're working with sounds, I'll transcribe words in IPA (which you already know how to read, right?). Take a look at this list of numbers 1-9 taken from five languages:

language B language F language G language H language I
/tahi/ /taha/ /ʔekahi/ /tasi/ /tahi/
/rua/ /ua/ /ʔelua/ /lua/ /rua/
/toru/ /tolu/ /ʔekolu/ /tolu/ /toru/
/fa:/ * /fa:/ /ʔeha/ /fa/ /ha:/ *
/rima/ /nima/ /ʔelima/ /lima/ /rima/
/ono/ /ono/ /ʔeono/ /ono/ /hono/
/fitu/ /fitu/ /ʔehiku/ /fitu/ /hitu/
/waru/ /valu/ /ʔewalu/ /valu/ /varu/
/iwa/ /iva/ /ʔeiwa/ /hiva/ /hiva/

* If you studied The IPA for Language Learning, you will recall that /a:/ represents a long vowel (the sound is held roughly twice as long). However, I did not introduce this long vowel symbol in the intro to the IPA lesson.

Practice Exercise 3

You will find answers to this and every activity in the answers to the exercises at the bottom of this page.

1) Circle (or bold) any phonemes that remain the same throughout every cognate in each row.

2) Next, underline any phonemes that differ between cognates in each row.

3) Finally, reorder the languages from most to least similar: language B, language F, language G, language H, language I.

Regular sound correspondences

As you look at the words above, it's not hard to see that these languages are all related. Every row contains five easy-to-spot cognates. This is unlike the last list, where relatedness was more difficult to determine.

Let's take a closer look at the phonemes in languages B, F, G, H & I. What are we looking for in these individual sounds?

First, consider the morphology of the words we are comparing. To reconstruct a word inherited from a common ancestor, you need to compare the same root morphemes as a basis for that reconstruction. In our case, languages B, F, G, H & I all have distinct morphemes for the numerals "one", "two", etc., but language G consistently adds /ʔe/ before the corresponding morpheme. Since this is unique to language G, it looks like an innovation, and we will ignore G's prefix /ʔe/ in our reconstruction. In other words, the cognate material in language G is /kahi/, /lua/, etc. rather than /ʔekahi/, /ʔelua/, etc.

Once you're sure you're comparing cognate morphemes, determine regular sound correspondences by comparing related phonemes in the set of words. The sound correspondences are regular because one phoneme in one language routinely corresponds to another phoneme in another language. For example, the /k/ in language G /kolu/ corresponds to /t/ in /tolu/ & /toru/. Further, language G /k/ also appears in /kahi/ where other languages have /t/ in /tahi/, /tasi/ & /taha/. We can establish a regular sound correspondence between /k/ and /t/.

language B language F language G language H language I
/t/ /t/ /k/ /t/ /t/

Where language G has /k/, languages B, F, H & I will have /t/. If this rule is watertight, it will apply with clockwork precision - if a word with /t/ in languages B, F, H or I has a cognate in language G, that cognate will have a /k/. We will start to check by testing our preliminary correspondence against other words:

language B language F language G language H language I
/taŋata/ /taŋata/ /kanaka/ /taŋata/ /taŋata/
man
/tai/ /tahi/ /kai/ /tai/ /tai/
sea

The regular sound correspondences represent the outcome of regular sound changes that separate the daughter languages from their common parent language. In this case, we have knowledge of the correspondence between /k/ and /t/ in the daughter languages, but what about the relationship between the daughter languages and their parent? The phoneme in the parent language underwent regular sound change, showing up as either /k/ or /t/ in the daughter languages. The evidence points strongly in one direction: /t/ in the proto-language changed to /k/ in language G but remained /t/ elsewhere.

img of simple Galaico-Portuguese family tree

Not all sound changes are so straightforward. For the number "three", language B has /toru/, F has /tolu/, G /ʔekolu/, H /tolu/ and I /toru/. We can make a primary estimation that G added two phonemes /ʔe/ and changed /t/ to /k/ to get /ʔekolu/. In that case, we are left with two competing words for "three" - /tolu/ and /toru/. Of /r/ and /l/, which phoneme is original?

As we take another glance at our cognates, we see that languages F, G & H only have /l/ where languages B & I only have /r/:

language B language F language G language H language I
/rua/ /ua/ /ʔelua/ /lua/ /rua/
/toru/ /tolu/ /ʔekolu/ /tolu/ /toru/
/waru/ /valu/ /ʔewalu/ /valu/ /varu/

When it comes to l's and r's, these languages fall into two types: languages that only have /l/, and languages that only have /r/. You conclude that you have to choose either /l/ or /r/ as the original phoneme, but you've reached a stalemate. That is, until we uncover the following evidence from three more related languages:

"two" "three" "eight"
Fijian /rua/ /tolu/ /walu/
Malagasy /rua/ /tolu/ /valu/
Watubela /rua/ /tolu/ /alu/

This is a common scenario, but less straightforward than the case of /k/ versus /t/ we examined above. Some languages have only one phoneme in an environment (here either /r/ or /l/ between two vowels), while others differentiate between multiple phonemes in that same environment. We will propose that the languages with one phoneme collapsed the multiple phonemes into a single phoneme. This process is known as a merger.

In the case of /r/ and /l/, languages that only have /l/ merged /l/ and /r/ into a single phoneme /l/. Languages that only have /r/ merged /l/ and /r/ into a single phoneme /r/. Languages that contrast /r/ and /l/ continue the tradition of the parent language, which means that we will reconstruct words with /r/ and /l/ as distinct phonemes in the parent language.

r, l distinct r,l > r merger r,l > l merger possible origin
rua rua lua __ua?
tolu toru tolu to__u?

Since languages that underwent the r,l>r merger always have /r/, while languages that underwent the r,l>l merger always have /l/, it makes sense to listen to the "r,l distinct" languages that still contrast /r/ and /l/. The cognates point to rua as the original word for "two" and tolu as the original word for "three".

Notice that we've been building ancestral words out of the common features of cognates in these daughter languages. In doing so, we have engaged in the process of reconstruction. We are reconstructing the proto-language that stands behind all these modern languages. Linguists have a convention when it comes to reconstructed words: we place an asterisk (*) in front of a word to signal that it is reconstructed, not attested. We will write the reconstructed words for "two" and "three" as *rua and *tolu.

In this section, you've been working with the Polynesian language family. We've actually been able to reconstruct two words from Proto-Polynesian: *rua and *tolu! Now take a look at the reconstructed words for numbers 1-9 from the proto-language alongside language B (Maori), language F (Tongan), language G (Hawaiian), language H (Niuean) and language I (Rapa Nui / Easter Island).

Proto-Polynesian Māori Tongan Hawaiian Niuean Rapa Nui
*taha /tahi/ /taha/ /ʔekahi/ /tasi/ /tahi/
*rua /rua/ /ua/ /ʔelua/ /ua/ /rua/
*tolu /toru/ /tolu/ /ʔekolu/ /tolu/ /toru/
*faa /fa:/ /fa:/ /ʔeha/ /fa/ /ha:/
*lima /rima/ /nima/ /ʔelima/ /lima/ /rima/
*ono /ono/ /ono/ /ʔeono/ /ono/ /hono/
*fitu /fitu/ /fitu/ /ʔehiku/ /fitu/ /hitu/
*walu /waru/ /valu/ /ʔewalu/ /valu/ /varu/
*hiwa /iwa/ /iva/ /ʔeiwa/ /hiva/ /hiva/

Practice Exercise 4

You will find answers to this and every activity in the answers to the exercises at the bottom of this page.

Use your comparative skills to examine the cognates in the list below, and your reconstruction skills to answer the questions that follow. You'll see the phonemes /tʃ/ in Romanian and Ladin; they represent an affricate in both languages.

Galician Occitan Italian Ladin Romanian translation
/sete/ /set/ /sette/ /set/ /ʃapte/ seven
/kan/ /kan/ /kane/ /tʃan/ /kɨne/ dog
/un/ /yn/ /uno/ /un/ /un/ one
/katro/ /katre/ /kwattro/ /kater/ /patru/ four
/aver/ /abe/ /avere/ /ave/ /avere/ to have
/tres/ /tres/ /tre/ /trei/ /trei/ three
/tu/ /ty/ /tu/ /tu/ /tu/ you
/ben/ /ben/ /bene/ /ben/ /bine/ well/good
/ke/ /ke/ /ke/ /ke/ /tʃe/ what?

1) List every consonant correspondence you can find between the languages.

2) List every vowel correspondence you can find between the languages.

3) Use the cognates in each row, along with your breakdown of consonant & vowel correspondences, to reconstruct the original words in the proto-language to the best of your ability.

4) Over time, sounds may erode away as a cognate travels from its parent language to its daughter languages. This process is known as deletion, and involves the complete loss of one or more phonemes. The process often wreaks its havoc at the end of words. Can you find any examples of phoneme deletion in the above list of cognates? Which words, and in which languages?

5) When a consonant is pronounced more like a nearby sound, it comes to share one or more phonetic features of that nearby sound. This very common type of sound change is known as assimilation. Can you find any instances of assimilation? Which sounds, in which words, and in which languages?

6) As you just learned, a merger is the result of two or more distinct phonemes that end up being reduced to a single phoneme in a language. Can you find any examples of mergers in the list of cognates above? In which languages? Which phonemes in the proto-language merge into which single phoneme?

7) The opposite of a merger is a split (a single phoneme turns into two or more contrastive phonemes in a language). Like mergers, you can spot splits by comparing cognates in related languages. Are there any examples of splits in the above data? In which languages? Which single phoneme in the proto-language splits into which phonemes in the daughter languages?

8) Take another look at your reconstructions (answers to #3 above). Based on your understanding of assimilation, deletion, mergers & splits, correct any inconsistencies.

Answers to the practice exercises

Practice Exercise 1 (looking for cognate words in five mystery languages)

1) (Simply copy the list of cognates.)

2) Bold words in each row that are obviously similar:

language A language B language C language D language E
quen ko chi quem who
dous rua due dois two
tres toru tre três three
nai? whaea madre mãe? mother
muller wahine donna mulher woman
waewae piede foot
cavalo hoiho cavallo cavalo horse
peixe ika pesce peixe fish
tu koe tu tu you
pai matua padre pai father
nome ingoa nome nome name
noite notte noite night

3) Underline words that resemble the words from #2, but whose similarities are less obvious:

language A language B language C language D language E
quen ko chi quem who
dous rua due dois two
tres toru tre três three
nai whaea madre mãe mother?
muller wahine donna mulher woman
waewae piede foot?
cavalo hoiho cavallo cavalo horse
peixe ika pesce peixe fish?
tu koe tu tu you
pai matua padre pai father?
nome ingoa nome nome name
noite notte noite night

4) Based on the words in the list above, the languages resemble each other in this order (most to least similar): language A, language D, language C, language E, language B.

Practice Exercise 2 (comparing Japanese numerals)

Japanese has two native sets of numerals: one set of Chinese numbers, which we'll call "set 1", and one of native Japanese numbers, which we'll call "set 2". On top of that, English numbers have been nativized for popular use, so we'll call that "set 3".

Sino-Japanese (set 1) Japanese numbers (set 2) "Japanese English" (set 3)
ichi hitotsu wan
ni futatsu tsu
san mittsu surii
shi yottsu fo

Set 1 is made up of words borrowed at some time from Chinese. This is an example of borrowing. Set 3, too, contains words borrowed from a foreign language, this time English. This is another instance of borrowing. Set 2 are genuine Japanese cognates inherited directly from a parent language. If there are any resemblances between the three sets (like the -tsu in Japanese futatsu and "Japanese English" tsu), those resemblances are likely due to chance.

`

Practice Exercise 3 (sound correspondences between cognates in five mystery languages)

1) Bold phonemes that remain the same throughout the cognates in each row:

language B language F language G language H language I
/tahi/ /taha/ /ʔekahi/ /tasi/ /tahi/
/rua/ /ua/ /ʔelua/ /lua/ /rua/
/toru/ /tolu/ /ʔekolu/ /tolu/ /toru/
/fa:/ /fa:/ /ʔeha/ /fa/ /ha:/
/rima/ /nima/ /ʔelima/ /lima/ /rima/
/ono/ /ono/ /ʔeono/ /ono/ /hono/
/fitu/ /fitu/ /ʔehiku/ /fitu/ /hitu/
/waru/ /valu/ /ʔewalu/ /valu/ /varu/
/iwa/ /iva/ /ʔeiwa/ /hiva/ /hiva/

2) Next, underline any phonemes that differ between cognates in each row:

language B language F language G language H language I
/tahi/ /taha/ /ʔekahi/ /tasi/ /tahi/
/rua/ /ua/ /ʔelua/ /lua/ /rua/
/toru/ /tolu/ /ʔekolu/ /tolu/ /toru/
/fa:/ /fa:/ /ʔeha/ /fa/ /ha:/
/rima/ /nima/ /ʔelima/ /lima/ /rima/
/ono/ /ono/ /ʔehono/ /ono/ /hono/
/fitu/ /fitu/ /ʔehiku/ /fitu/ /hitu/
/waru/ /valu/ /ʔewalu/ /valu/ /varu/
/iwa/ /iva/ /ʔeiwa/ /hiva/ /hiva/

3) It's a judgment call to establish which languages are closest to which based on the resemblances above. Right now I'll endorse this lineup: language G, language I, language B, language F, language H. (Read this as degrees of separation: "Language G resembles language I which resembles language..."). A more mature approach would be to gather G, I & B within its own separate subgroup or subfamily, and H & F in another. This classification isn't foolproof, and it serves us to ask more questions and gather more evidence. For example, if G belongs to a subfamily G-I-B, why does it have the characteristic /l/ of H & F instead of /r/ like I & B?

Practice Exercise 4 (cognates from Galician, Occitan, Italian, Ladin & Romanian)

1) Every consonant correspondence I find (as I mentioned, /tʃ/ represents an affricate, so I treat it as a single sound):

Galician Occitan Italian Rhaeto-Romansch (Ladin) Romanian
/s/, 0, /t/ /s/, 0, /t/ /s/, /t/, /t/ /s/, 0, /t/ /ʃ/, /p/, /t/
/k/, /n/ /k/, /n/ /k/, /n/ /tʃ/, /n/ /k/, /n/
/n/ /n/ /n/ /n/ /n/
/k/, 0, /t/, /r/ /k/, 0, /t/, /r/ /kw/, /t/, /t/, /r/ /k/, 0, /t/, /r/ /p/, 0, /t/, /r/
/v/, /r/ /b/, 0 /v/, /r/ /v/, 0 /v/, /r/
/t/, /r/, /s/ /t/, /r/, /s/ /t/, /r/, 0 /t/, /r/, 0 /t/, /r/, 0
/t/ /t/ /t/ /t/ /t/
/b/, /n/ /b/, /n/ /b/, /n/ /b/, /n/ /b/, /n/
/k/ /k/ /k/ /k/ /tʃ/

2) Every vowel correspondence I find:

Galician Occitan Italian Rhaeto-Romansch (Ladin) Romanian
/e/, /e/ /e/, 0 /e/, /e/ /e/, 0 /a/, /e/
/a/ /a/ /a/ /a/ /ɨ/
/u/, 0 /y/, 0 /u/, /o/ /u/, 0 /u/, 0
/a/, 0, /o/ /a/, 0, /e/ /a/, 0, /o/ /a/, /e/, 0 /a/, 0, /u/
/v/, /r/ /b/, 0 /v/, /r/ /v/, 0 /v/, /r/
/e/ /e/ /e/ /ei/ /ei/
/u/ /y/ /u/ /u/ /u/
/e/, 0 /e/, 0 /e/, /e/ /e/, 0 /i/, /e/
/e/ /e/ /e/ /e/ /e/

3) Use the cognates in each row to reconstruct the corresponding word in the proto-language to the best of your ability.

Galician Occitan Italian Ladin Romanian Proto-language
/sete/ /set/ /sette/ /set/ /ʃapte/ *sept(e)/*sett(e)
/kan/ /kan/ /kane/ /tʃan/ /kɨne/ *kan(e)
/un/ /yn/ /uno/ /un/ /un/ *un(o)
/katro/ /katre/ /kwattro/ /kater/ /patru/ *kwat(t)(e)ro
/aver/ /abe/ /avere/ /ave/ /avere/ *ave(r)(e)
/tres/ /tres/ /tre/ /trei/ /trei/ *tres/*trei
/tu/ /ty/ /tu/ /tu/ /tu/ *tu
/ben/ /ben/ /bene/ /ben/ /bine/ *bene
/ke/ /ke/ /ke/ /ke/ /tʃe/ *ke

4) It looks like these phonemes have been subject to deletion:
Occitan & Ladin deleted final /e/ to form /set/ (still found in Galician, Italian & Romanian /sete/ ~ /sette/ ~ /ʃapte/)
Galician, Occitan & Ladin lost final /e/ from /kan/ ~ /tʃan/ (still found in Italian & Romanian /kane/ ~ /kɨne/)
Galician, Occitan, Ladin & Romanian eroded /o/ from the end of /un/ ~ /yn/ (still present in Italian /uno/)
Galician eroded final /e/ from /aver/; Occitan & Ladin deleted final /r/ and /e/ to get /abe/ ~ /ave/ (still found in Italian & Romanian /avere/)
Italian /tre/ & Romanian/Ladin /trei/ lost a final /s/ (still found in Galician & Occitan /tres/)
Galician, Occitan & Ladin /ben/ (still found on Italian & Romanian /bene/ ~ /bine/)

5) There is one clear case of assimilation: /p/ assimilated to /t/ to form Italian /sette/, which was reduced to /sete/ & /set/ in the other languages. Only Romanian retains the unassimilated consonant cluster /pt/. The voiceless bilabial plosive /p/ assimilated to the voiceless dental plosive /t/ by picking up its place of articulation [+dental].

6) Looking back at my list of consonant phoneme correspondences, I can see that Occitan has /b/ where other languages have /b/ (in /ben/), but also /b/ where other languages have /v/ (in /abe/ instead of /ave/). If this correspondence holds, Occitan merged the proto-language phonemes /v/ and /b/ into a single phoneme /b/.

7) Taking another look at my list of consonant phoneme correspondences, I see that Romanian has /tʃ/ in /tʃe/ where other languages only have /k/. Likewise, Ladin has /tʃ/ in /tʃan/ where other languages unanimously have /k/. Since the evidence favors the proto-language having /k/ in these cases, Romanian & Ladin both split the proto-language phoneme /k/ into two separate phonemes: /k/ and /tʃ/.

8) I can now correct and firm up my reconstructions of the original proto-language, called Proto-Romance, following this logic:

Proto-Romance explanation
*septe /pt/ assimilated to /tt/ > /t/; deletion of final /e/
*kane Ladin splits /k/ > /tʃ/ & /k/ (uses /tʃ/ here); deletion of final /e/; Romanian raises height of /a/ > /ɨ/
*uno deletion of final /o/; Occitan fronts /u/ > /y/
*kwattro/*kwattor /tt/ > /t/; /kw/ > /k/; Romanian /kw/ > /p/; Ladin or other languages switch /r/ & /o/
*avere Occitan /v/ > /b/; final /e/ or even final /re/ deletion
*tres Romanian, Ladin & Italian deletion of final /s/
*tu Occitan fronting of /u/ > /y/
*bene Romanian raising /e/ > /i/; final /e/ deletion
*ke Romanian splits /k/ > /tʃ/ & /k/ (uses /tʃ/ here)