Chinese most Difficult Language in the World (2)Written by Julen Madariaga on November 23rd, 2009
Last Friday I wrote a very long post where I ended up including too many ideas. The main point got a bit obscured as a result, but it was simply this: that vocabulary plays an essential role in learning a language, and that because of this Chinese is not only extremely difficult at an advanced level, but also growing more difficult with time.
I don’t suppose this is groundbreaking research, but it is interesting because most people are not aware of it, and also for its implications in the limit betwen language and politics, two fields we like to cultivate in this blog. Here is the argument in full with conclusions, for examples and details see the previous post and its comments:
- To learn a new language the main knowledge required is in three areas: grammar, phonetics and vocabulary. Grammar and phonetics differ essentially from vocabulary in that the first two are rules applicable to infinite cases, whereas the latter is raw data. We can call them the Code and the Data elements of the language. The Code elements are finite and not growing. The Data element is practically infinite and growing, to the point that it is not completely mastered even by native speakers.
- When studying a language, the Code elements play an essential role in the basic and intermediate levels, but at advanced level the real obstacle for communication—and therefore for progress—is Data. For example, in German advanced students may sometimes use the wrong declension, and in Spanish they may fail to differentiate “rr/r”sounds. These things tend to not hamper communication because human languages are highly redundant. I would never understand “pero” (but) when a speaker says “perro”(dog). Ultimately, imperfections in the Code elements amount to the same as having an accent: most of the times they are only relevant as metadata.
- But while Code above a certain level is highly redundant, Data remains essential at every level. Borrowing from this great article: The phrase “Jacuzzi is found effective in treating Phlebitis”is meaningless when either or both of the nouns are unknown. A single missing word can often obscure the meaning of a whole paragraph or article.
- The number of words used passively in real life far exceeds the typical standard lists of language levels. This is because semi-specialized words—such as ionic, jacuzzi or matrix—are not included in vocabulary lists as they are considered too rare. Certainly each of these words is rarely used, but there are so many of them that as a whole they are actually very often used. This Data element is so large that it cannot be memorized in a classroom, and the only way to acquire it is through many years of immersion.
- The reason why most language learners never realize this problem is because they are “cheating”. In most languages in the World, this high level vocabulary is practically identical and it doesn’t need to be learned. There is a certain limit level for each language above which most modern words are international and the Data is no more specific of the language .
- This limit level of vocabulary convergence is different for every language, but it doesn’t so much depend on the language family or geographical origin, rather it depends on the size and the development of the community of speakers. That is the reason why even non indo-European languages like Basque are extremely easy above the intermediate level: the community is not big enough to support complex terms, and all higher Data is adopted from International words. Most people tend to misunderstand and attach too much importance to the concept of language families, and they come up with absurd lists like this one.
- The internationalization of vocabulary is growing with the advances in telecoms and globalization, especially since English has become the only language of scientific research. There is little point in inventing new Swedish terms in science, for example, when all the scientific community are reading/writing their papers in English. Often, in spite of political efforts to promote a local vocabulary, the economics of language revert the higher Data back to Internationalese.
- There is only one language in the World that for historical, political and demographic reasons has remained an exception to this trend: that language is Chinese (Mandarin, Cantonese or others, the difference is irrelevant here). It constitutes a parallel system of high level Data that has very few words in common with the rest of the Word. Japanese and Korean are partial exceptions in that they draw from both the Chinese and the International System, but modern words are increasingly International and these languages are converging with the rest.
- In addition to this, Chinese has a ridiculously difficult writing system unique for its lack of a functional phonetic script. This compounds the vocabulary problem: not only there are more words to learn than in any other language, but each word contains much more information as it needs to be associated with its corresponding characters.
- Moreover, since there is no standardized way to transcribe foreign Proper Nouns, even names of places and persons tend to be “translated” into Chinese, sometimes completely departing the original phonetics and becoming Chinese Names in their own right. This adds to the already massive Data element in the Chinese language.
All this takes us to the conclusion: Chinese is the most difficult language to learn at a high level, regardless of the origin of the student.
This is particularly interesting because up to now the right answer to this question was only: “depends on your own mother tongue”. With the possible exception of Japanese/Korean students, this post justifies that Chinese is actually the hardest for everyone else. Inversely, it is also very difficult for Chinese to learn other languages, although this is mitigated by the fact that other languages do have functional phonetic scripts.
Another interesting conclusion: Chinese is not only difficult, it is actually growing in difficulty.
As the World grows more interconnected and technology occupies a more important part of our lives, new semi-specialized vocabulary takes an increasing part in everyday language. Expressions that refer to international concepts such as “spam”or “plasma TV” increasingly take the place of expressions referring to local cultural heritage. In this sense, we can say that all languages in the World are converging, while Chinese is an island diverging from all the rest.
Then there are the political conclusions that we can draw from this, but I am committed to writing shorter posts, so we will leave that for the next day. Comments and corrections are welcome to my arguments above.