Language Thursdays: Parsing Chinese 1.0Written by Julen Madariaga on May 7th, 2010
I was flying back from Chongqing recently when I was reminded of the very frustrating problem of reading Chinese. There was a movie on the cabin TV and it had a particularity: it carried subtitles in Chinese and English in parallel, in two lines of comparable font at the bottom of the screen.
As I watched I kept forcing my eyes to stick to the Chinese subtitles in order to exercise my reading (the sound was off) but it was pointless. Every single time, before I had finished reading the Chinese I already knew the meaning of the line anyway. The words in English just seemed to transmit their meaning even if I was not looking at them.
We already spoke last year about the problem of Reading Chinese functionally. It is very important for advanced students of Chinese, because progress beyond a certain level depends largely on this ability. Many foreigners are able to read slowly and even do good translations of Chinese texts with the help of a cursor dictionary. But to read functionally, in my definition, is a completely different thing. It means to be able to read all sorts of general texts as quickly and reliably as an average native.
I have observed that this reading fluency is extremely difficult to attain for readers who were not educated in the Chinese system. And I know from personal experience that this is not a common problem of learning foreign languages; with practice, reading fluency comes parallel to speaking in languages with alphabetic script. This problem is unique to Chinese characters, and I have the impression that it has been largely ignored by educators.
The Reading Test
I want to differentiate very clearly the reading skills from the acquisition of new vocabulary/characters. Obviously when you need to look up words in the dictionary, reading is slower, but that is not what we want to measure. We can define a test to measure the reading speed :
The index is the time you take to read a 500 character text divided by the time taken to read a similar text (the next 500 character section in the same book) in your native language, with the premise that you are familiar beforehand with all the characters/words/expressions contained in the text, and no preparation prior to reading is allowed. The test is easily performed with a bilingual book, although it takes some trial and error until you find a section where there are no unknowns.
Since there is no vocabulary or missing character issue, the indexed difference in speed is mostly due to the difficulty in parsing the message, what I call the pure reading skills.
I am beginning to suspect that this index is very difficult to decrease, even with long periods of daily reading. I scored about a 3 in the test last year, and almost one year later (pending careful testing) I am afraid I am not far from where I was. Actually, I might be cheating slightly because I was using an Obama book that I had already read before.
Let’s see all the possible reasons why it is so difficult to parse the message when we know all the elements inside it. As far as I can think, there are 3 main complex processes that we do when we read: 1- Recognizing the characters 2 – Parsing them into words 3- Parsing the words into sentences.
Step 1 – Recognizing the characters
It is understood that when we read English, we normally don’t read letter by letter to make out a sound, but rather we recognize whole words or even chunks of them at a glance. This allows us to read very fast, and I am sure the same kind of phenomenon happens when Chinese read their language. They see a 中央政治局常务委员会 in one beat of the eye.
I see here the first big obstacle to our reading. We have not developed the skills to make out these complex shapes automatically, and we are forced to consciously recognize each character before we move on. Even for the very basic characters in the previous paragraph, I still cannot take in all of it as immediately as I take “Politburo Standing Committee”.
What do you think? This is the Step 1.
Step 2 – Parsing the Words from Characters
One thing is to recognize a chunk of characters at a glance, but a different thing is to identify the words that they form. This step is extremely easy in Western languages, because the words are clearly separated by spaces, and proper nouns have Capital Letters. But written Chinese doesn’t offer this help, so there is an added parsing step in figuring out where are your units of meaning.
See for example the expression 发展中国家, I can tell you in no time that it means “developing country”. But now check out this random section of text I just copied from the internet:
Is it 在发展-中国-家粮食？Or is it 在-发展中国家-粮食? Obviously it is the second one, but if we read character by character and follow the statistically economic approach, our first tentative parsing would be the first one. A native reader sees the whole 5 character chunk at once and detects the word, but due to the difficulty of characters, most foreign readers see in small chunks of 2 characters, which forces them into a process of trial and error.
In fact, the example given above is very elementary, but consider introducing into a text longer words, fixed phrases and foreign names like this one: 圣文森特和格林纳丁斯 (special prize of the jury to the foreigner who gets this). It is easy to see that the Chinese are adding a whole step of parsing that is practically inexistent in our languages.
In case you are skeptical, it is easy to do a realistic simulation of what that added step would mean if we had it in English. Just see how long you take to read this text, taken from this article:
karunanidhiwaslividthatdayanidhiandbrotherkalanidhihadbecometooambitious holdingpopularitycontestsagainstalagiriintheirnewspaper,whoseofficewas burntdown.rajadidnottakechargeofthetelecomministryalone.kanimozhiwas toremainhis”guide”.hewasfocused.hisallegedundersellingofthe2Gspectrum(a designatedpartoftheairwavesforusebymobilephoneoperators),whichcaused alossofRs22,466croreaspertheCBI’sestimate,surfaced.
Good luck! It is almost difficult to believe that Chinese actually read their language at normal speeds (and believe me, they do).
Of course, there is not an exact equivalence, because Chinese characters combine in different ways from English letters/words. But it gives a good feel of this tricky parsing step that is unique to Chinese. Native minds have developed since childhood to accomplish this in an instant, but this step involves some process that is quite different from what we are trained to do. Is it possible to acquire that ability? This is what I mean by the Step 2.
Step 3 – Parsing the Words into Sentences
The parsing of sentences once we have the words is overall similar to what we do in Western languages. In fact, Chinese grammar is not all that different from English grammar at the level of the sentence structure. Those tricky long sentences usually have a similar order, and the clauses are marked with commas (ideally) in a similar way to English. This step is much easier, in my experience, than parsing long sentences in agglutinative languages like Basque, where a good part of the grammar information is only given at the end of the sentence in the form of a verb declension.
This is only a 1.0 issue and it will be improved/completed in further posts. I wanted to share these points and get some feedback and ideas before I continue.
This subject is important because it can help us understand how the Chinese reading process works, and perhaps also develop a method to help all those students who are stuck in the advanced (but not functional) level. As more people decide to learn Chinese seriously, the number of students stumbling on this block will increase – it is already large even today.
For the moment, it seems clear that these Step 1 and Step 2 that I describe above are the main obstacle to fluent reading, but I want to find more ways to quantify this. In particular, I have the following ideas that we could try to do if someone is interested:
- Do a larger scale test for the Reading speed.
- Test the reading Speed of Natives in their own language and in English.
- Do a test to quantify Step 2 (by comparing word-spaced character reading speed with normal reading speed of a similar text)
- Answer to the question: is it actually possible to improve in Steps 1 and 2, or is it some automated process you need to learn as a child.
- Think of possible exercises to improve Steps 1 and 2.
Any ideas on this points will be welcome, and any links to previous research as well. Nothing of what I say here is written on stone, and I would very much appreciate other suggestions.