Language Thursdays: Parsing Chinese 1.0

Written by Julen Madariaga on May 7th, 2010


I was flying back from Chongqing recently when I was reminded of the very frustrating problem of reading Chinese. There was a movie on the cabin TV and it had a particularity: it carried subtitles in Chinese and English in parallel, in two lines of comparable font at the bottom of the screen.

As I watched I kept forcing my eyes to stick to the Chinese subtitles in order to exercise my reading (the sound was off) but it was pointless. Every single time, before I had finished reading the Chinese I already knew the meaning of the line anyway. The words in English just seemed to transmit their meaning even if I was not looking at them.

Reading Chinese

We already spoke last year about the problem of Reading Chinese functionally. It is very important for advanced students of Chinese, because progress beyond a certain level depends largely on this ability. Many foreigners are able to read slowly and even do good translations of Chinese texts with the help of a cursor dictionary. But to read functionally, in my definition, is a completely different thing. It means to be able to read all sorts of general texts as quickly and reliably as an average native.

I have observed that this reading fluency is extremely difficult to attain for readers who were not educated in the Chinese system. And I know from personal experience that this is not a common problem of learning foreign languages; with practice, reading fluency comes parallel to speaking in languages with alphabetic script. This problem is unique to Chinese characters, and I have the impression that it has been largely ignored by educators.

The Reading Test

I want to differentiate very clearly the reading skills from the acquisition of new vocabulary/characters. Obviously when you need to look up words in the dictionary, reading is slower, but that is not what we want to measure. We can define a test to measure the reading speed :

The index is the time you take to read a 500 character text divided by the time taken to read a similar text (the next 500 character section in the same book) in your native language, with the premise that you are familiar beforehand with all the characters/words/expressions contained in the text, and no preparation prior to reading is allowed. The test is easily performed with a bilingual book, although it takes some trial and error until you find a section where there are no unknowns.

Since there is no vocabulary or missing character issue, the indexed difference in speed is mostly due to the difficulty in parsing the message, what I call the pure reading skills.

Parsing Chinese

I am beginning to suspect that this index is very difficult to decrease, even with long periods of daily reading. I scored about a 3 in the test last year, and almost one year later (pending careful testing) I am afraid I am not far from where I was. Actually, I might be cheating slightly because I was using an Obama book that I had already read before.

Let’s see all the possible reasons why it is so difficult to parse the message when we know all the elements inside it. As far as I can think, there are 3 main complex processes that we do when we read: 1- Recognizing the characters 2 – Parsing them into words 3- Parsing the words into sentences.

Step 1 – Recognizing the characters

It is understood that when we read English, we normally don’t read letter by letter to make out a sound, but rather we recognize whole words or even chunks of them at a glance. This allows us to read very fast, and I am sure the same kind of phenomenon happens when Chinese read their language. They see a 中央政治局常务委员会 in one beat of the eye.

I see here the first big obstacle to our reading. We have not developed the skills to make out these complex shapes automatically, and we are forced to consciously recognize each character before we move on. Even for the very basic characters in the previous paragraph, I still cannot take in all of it as immediately as I take “Politburo Standing Committee”.

What do you think? This is the Step 1.

Step 2 – Parsing the Words from Characters

One thing is to recognize a chunk of characters at a glance, but a different thing is to identify the words that they form. This step is extremely easy in Western languages, because the words are clearly separated by spaces, and proper nouns have Capital Letters. But written Chinese doesn’t offer this help, so there is an added parsing step in figuring out where are your units of meaning.

See for example the expression 发展中国家, I can tell you in no time that it means “developing country”. But now check out this random section of text I just copied from the internet:


Is it 在发展-中国-家粮食?Or is it 在-发展中国家-粮食? Obviously it is the second one, but if we read character by character and follow the statistically economic approach, our first tentative parsing would be the first one. A native reader sees the whole 5 character chunk at once and detects the word, but due to the difficulty of characters, most foreign readers see in small chunks of 2 characters, which forces them into a process of trial and error.

In fact, the example given above is very elementary, but consider introducing into a text longer words, fixed phrases and foreign names like this one: 圣文森特和格林纳丁斯 (special prize of the jury to the foreigner who gets this). It is easy to see that the Chinese are adding a whole step of parsing that is practically inexistent in our languages.

In case you are skeptical, it is easy to do a realistic simulation of what that added step would mean if we had it in English. Just see how long you take to read this text, taken from this article:

karunanidhiwaslividthatdayanidhiandbrotherkalanidhihadbecometooambitious holdingpopularitycontestsagainstalagiriintheirnewspaper,whoseofficewas burntdown.rajadidnottakechargeofthetelecomministryalone.kanimozhiwas toremainhis”guide”.hewasfocused.hisallegedundersellingofthe2Gspectrum(a designatedpartoftheairwavesforusebymobilephoneoperators),whichcaused alossofRs22,466croreaspertheCBI’sestimate,surfaced.

Good luck! It is almost difficult to believe that Chinese actually read their language at normal speeds (and believe me, they do).

Of course, there is not an exact equivalence, because Chinese characters combine in different ways from English letters/words. But it gives a good feel of this tricky parsing step that is unique to Chinese. Native minds have developed since childhood to accomplish this in an instant, but this step involves some process that is quite different from what we are trained to do. Is it possible to acquire that ability? This is what I mean by the Step 2.

Step 3 – Parsing the Words into Sentences

The parsing of sentences once we have the words is overall similar to what we do in Western languages. In fact, Chinese grammar is not all that different from English grammar at the level of the sentence structure. Those tricky long sentences usually have a similar order, and the clauses are marked with commas (ideally) in a similar way to English. This step is much easier, in my experience, than parsing long sentences in agglutinative languages like Basque, where a good part of the grammar information is only given at the end of the sentence in the form of a verb declension.


This is only a 1.0 issue and it will be improved/completed in further posts. I wanted to share these points and get some feedback and ideas before I continue.

This subject is important because it can help us understand how the Chinese reading process works, and perhaps also develop a method to help all those students who are stuck in the advanced (but not functional) level. As more people decide to learn Chinese seriously, the number of students stumbling on this block will increase – it is already large even today.

For the moment, it seems clear that these Step 1 and Step 2 that I describe above are the main obstacle to fluent reading, but I want to find more ways to quantify this. In particular, I have the following ideas that we could try to do if someone is interested:

  • Do a larger scale test for the Reading speed.
  • Test the reading Speed of Natives in their own language and in English.
  • Do a test to quantify Step 2 (by comparing word-spaced character reading speed with normal reading speed of a similar text)
  • Answer to the question: is it actually possible to improve in Steps 1 and 2, or is it some automated process you need to learn as a child.
  • Think of possible exercises to improve Steps 1 and 2.

Any ideas on this points will be welcome, and any links to previous research as well. Nothing of what I say here is written on stone, and I would very much appreciate other suggestions.

Sharing is free, support my work:

  • Twitter
  • Facebook
  • email
  • Google Bookmarks
  • Digg
  • Haohao
  • StumbleUpon
  • Technorati
  • LinkedIn
  • Netvibes
  • Reddit
  • Posterous
  • Live
  • QQ书签
  • MSN Reporter
  • 豆瓣
  • Yahoo! Buzz
  • MySpace
  • FriendFeed
  • Print

Comments so far ↓

  1. May

    Saint Vincent and the Grenadines … My Chinese map study finally came in handy!

    [Reply to this comment]

    Julen Madariaga Reply:

    This, sir, is very impressive.

    I am looking forward to a new St-Vincent-and-Grenadines-Parsing Machine from the Brad’s Lab :)

    [Reply to this comment]

  2. May

    Very interesting indeed.Similar cases(vice verse, of course) happened to me when i was watching pirated english english-speaking movies.Then the situation became less identifiable.I don’t know which lead it to my head first.Although in this case (sound off), ihaven’t experienced much. My bet is still Chinese Characters beat English letters ahead even if i read English almost as fast. I believe this is a very common scene among english students: not matter how adept you are at reading english, your native ideograpic bits always flow dowm fasters a bit.

    I have yet to know someone with a different story. Good luck with yours! BTW,have you read the book i mentioned days before?

    [Reply to this comment]

    Julen Madariaga Reply:

    @safarinew, I think you refer to this one right? I checked it out, but I decided not to get it because I am already too busy worrying about foreigners learning Chinese to go worry about Chinese learning English… :)

    A related text that I recommend everyone to check out is this one.. I found it 其母的 hilarous

    [Reply to this comment]

  3. May

    Fantastic post, Julen. Intuitively based on my own experience I agree there’s probably an enormous gap here. I don’t doubt that their are second language learners who do read at native speeds, but I think they are few and very far in between. I am certainly not one of them and I am not remotely, in your terms, “functional.”

    Maybe I didn’t read carefully enough but I don’t think you said explicitly what you seem to imply: that you expect the Julen Index (JI) to be higher for non-native speakers learning Chinese than for, say, Chinese learning English as non-native speakers, even when both groups are at comparable educational and speaking levels.

    Is that right? Just to be painfully clear, my JI might be 5 for Chinese while my Chinese counterpart (similar background) would be better, say JI of 3.

    I’m not utterly convinced it would turn out that way. For example my Beijinger wife, who did two graduate degrees and 10+ years of work in the US, still reads much slower in English than I do. Still, I think the gap is likely to exist for exactly the parsing reasons you describe.

    [Reply to this comment]

    Julen Madariaga Reply:

    you expect the Julen Index (JI) to be higher for non-native speakers learning Chinese than for, say, Chinese learning English as non-native speakers, even when both groups are at comparable educational and speaking levels.

    I didn’t say that because the article already contained a sufficient number of unproven assertions. But yes, I do think that is true. Not only because the letters are 26 and the characters are thousands (I am sure this difficulty can be overcome through dedicated study of Chinese), but also because alphabetic scripts have one step less in the parsing process.

    who did two graduate degrees and 10+ years of work in the US, still reads much slower in English than I do

    Yes, but the point is to compare her reading English with you reading Chinese. If she has this experience in the US I would bet my shirt she beats you at that.

    [Reply to this comment]

  4. May

    I think there is a difference in the medium also. Because I find it easier to read the subtitles in films rather than read from books.

    [Reply to this comment]

  5. May

    Ah, now I know why you were complaining about Chinese punctuation :-) I guess it helps to be a native speaker to read, because when I read the string of Chinese without punctuation, I find myself actually punctuating along the way. Maybe the Chinese use an invisible built in punctuation system.

    Also, I suspect people use different parts of the brain to read English and Chinese. Perhaps it’s like accent, once you reach a certain age, it’s harder to train the brain to read fluently? But again, I’ve known many Chinese people who are not native English speakers and who read English just as fast as native English speakers, sometimes their English is even better than native English speakers! But again, public education system in the US kinda suck.

    [Reply to this comment]

  6. May
    Julen Madariaga

    Maybe the Chinese use an invisible built in punctuation system.

    Yes, you could say that. This is what I mean by the parsing step 2, they seem to have a built-in system to detect words among the unspaced lines. They “add invisible punctuation” if you want to call it like that, in the form of spaces.

    Perhaps it’s like accent, once you reach a certain age, it’s harder to train the brain to read fluently

    And that is precisely what I am afraid of. Because if this is true it would mean that it is extremely difficult to reach reading fluency. Not only it would require learning thousands of extremely difficult characters and words (an aspect that I ignored here but was dealt with in previous posts) but it would also require the training in some mental ability that our brains were not prepared for.

    This is a big might here, of course. But it is a conclusion I am fearing more and more as I continue to read Chinese without ever approaching the fluency point that I experimented in previous languages.

    [Reply to this comment]

    Cathy Reply:

    Aww, don’t be discouraged. Chinese is very hard. But as long as you are determined to learn and you practice hard, I am sure you will achieve fluency. The human brain is often set in its ways, but when we put our heart into something, it is also capable of incredible feats.

    [Reply to this comment]

  7. May
    Julen Madariaga

    PS. Anyone following this should also check out the discussion going on at Sinoglot.

    They have an interesting link there to a piece of research that would indicate that Chinese indeed don’t read any faster when they get spaces between words.

    It looks indeed like their brain already “adds those spaces”, as Cathy says, and they gain nothing from the writer doing this for them.

    [Reply to this comment]

  8. May

    I almost forgot to mention one of my favorite phrases to parse incorrectly: 保持共产党员先进性教育活动. Advanced Sex Education Activities for Communists?

    [Reply to this comment]

  9. May

    I still keep a pinyin transcription I did of a book once (160 pages, I think) and reread parts of it yesterday. The interesting thing was that it felt much more cumbersome to read the text than it had been to read something similar without tone signs. I’m mentioning this is because I believe it’s not just a Chinese character thing but a question of how much extra information is handled compared to the usual 26 letters.

    Interesting test, btw. I love tests - will try it tonight! I have some bilingual books in the bookshelf, so it shouldn’t be too hard.

    [Reply to this comment]

  10. May

    Great blog post.
    As a native speaker of Chines, I remember I used to play a game with my younger brother when we were bored. We kept staring a Chinese character we picked up randomly until it looks unfamiliar, weird and even unrecognizable. It takes an average of about 2-3 minutes for us to feel that way. But I don’t have the same experience with English letters. I am guessing that’s probably due to the much more complex construction of Chinese characters than English letters.

    I think the reason why you haven’t progressed to the level of functional proficiency is that you haven’t got to the point where you can immediately detect if the combination of two Chinese characters is correct. In addiction,you haven’t lived in the context of Chinese long enough to fully grasp the subtlety of Chinese.

    It is true that some Chinese can read English as fast as foreigners, because we don’t have to recognize the English characters which we have long been using in our pinyin system.

    保持共产党员先进性教育活动, yeah you can understand it that way. Actually we often joke about some of the phrases used in Chinese politics. BTW do you know the translation of it?There are so many meanings popping up in my mind I don’t know which one I should choose.

    I can read up to 600 to 800 characters per minute, but the speed varies depending on the contents of course.

    One more interesting thing, if you are a big fan of Chinese TV drama, which I guess you are not, you will be surprised to find out that a lot of them have Chinese subtitles while American ones don’t. You may wonder what the heck are subtitles for when you can listen and understand all the Chinese conversation. But the thing is it is easier for me to read the Chinese subtitles than listening to them.

    [Reply to this comment]

    Jason Reply:

    (Very interesting post! Looks like I’m a bit late to the discussion)
    “We kept staring a Chinese character we picked up randomly until it looks unfamiliar, weird and even unrecognizable. It takes an average of about 2-3 minutes for us to feel that way. But I don’t have the same experience with English letters. I am guessing that’s probably due to the much more complex construction of Chinese characters than English letters.”
    Interesting! I’m a native English speaker, and as kid, I played a similar game. (And I think many other kids have too) We, however, would speak a word until it became unrecognizable and ‘strange’.
    Do you think you could do the same with a spoken Chinese word? I’m also curious if I could do this with a written English word…

    [Reply to this comment]

  11. May
  12. May
    Julen Madariaga

    Interesting link. In terms of characters I estimate my reading is betweem 100-200 cpm, I think closer to 200 when it is an easy text and I know all the words. The 2 Chinese people I tested where both above 500cpm.

    [Reply to this comment]

  13. May

    The only “comfort” is that sometimes chinese people also get lost with their own words when they “pre-pair” them in the wrong way, especially with combinations that are not so common in daily language.

    [Reply to this comment]

  14. May
    Julen Madariaga

    What do you mean? Any example?

    [Reply to this comment]

  15. May

    Now I do not have any in mind, but I have experienced it when they read texts with very peculiar words. Of course, once they read it, they will remember this pairing for the next time.

    [Reply to this comment]

  16. May

    I now remember that in Taiwan, in many books, the proper names (mostly foreign ones or very specific-technical words) were usually underlined-or「 」, something that I’ve seen too in some Chinese Mainland older books -maybe it was the way in the Republican time-, to help the reader to link characters (like personal names) that usually do not appear together, like 洛杉矶,旧金山,马德里,欧阳开泰等

    [Reply to this comment]

  17. Jun

    I am a chinese student
    and it’s also hard for me to read english fluently with understanding words entirely

    [Reply to this comment]

  18. Aug

    I see I am very late to this discussion, but seeing as you haven’t posted recently, I guess it’s okay.
    I think you make some great points here, especially about the very different ways of approaching a written sentence in character-based versus alphabetic scripts.
    One thing I’d like to bring up is your question about whether or not it is possible to increase your reading speed and built-in ability to parse, as you put it.
    Fortunately the answer is yes, though you can’t do it just by reading a lot. It actually requires a rather structured approach. For native speakers, the ability is built up through the well-established practice of reading aloud and memorizing certain texts from various periods. Texts such as the San Zi Jing and whatnot. What happens here is that the students build up a mental database of sentence and phrase structures that actually becomes an intuitive (almost subconscious) awareness of the rhythm of the language. When a passage is well-written, it will adhere more or less to this rhythm or structure, allowing the parsing to be done automatically.
    I have heard about certain sets of exercises in use at advanced language training facilities such as the Oberlin immersion program that try to replicate this and other aspects of these built-in reading habits. I haven’t tried it myself, but teachers who have been using this method say that it is highly effective, and doesn’t require years of recitations from Chinese primary school texts. When you’re posting again, maybe you can look into this and give it a try.

    [Reply to this comment]

  19. Nov
    Michael A. Robson

    “Good luck! It is almost difficult to believe that Chinese actually read their language at normal speeds (and believe me, they do).”

    Yea, well normal is a pretty meaningless word here. I still hear tons of slipups on the radio, where the girl is obviously reading from a teleprompter/page, and trips over her words, stops and restarts again. Parsing words is huge, espcially since there are tons of examples where ‘how you parse the words’ depends on the context, its just more work for the brain/eye. Is Chinese the only language that doesn’t do it?

    [Reply to this comment]

  20. Sep

    Hello, very interesting blog entries!

    About invisible spaces in Chinese punctuation, maybe it helps to imagine it like a rhythm. In Germanic languages, new words are often created by throwing together several existing words that can point to its meaning, which presents a similar parsing problem. They also have very different sentence structures where one waits till the very end to find out what action has take place!

    For instance: medewerkerstevredenheidonderzoek

    Non-Dutch speakers could be trying to cut it a number of ways:

    When Dutch speakers know it should be read as: medewerkers-tevredenheids-onderzoek
    Meaning: employee satisfaction survey
    And is made of smaller components: ‘mede-werker’ > ‘together work’ = employee!
    and ‘onder-zoek’ > under-search = survey/research!

    So apart from familiarity with words, for me rhythm’s the intuition for knowing how to parse words. And with long difficult words like that in long European style sentences, it also helps to add flow so one can have the patience to get to the end and find out what the action was! Which also explains why natives are able to be grammatically correct without being able to explain to others what the rules are!

    [Reply to this comment]

Leave a Comment