browser

...now browsing by tag

 
 

Why it’s Good that Google.cn Leaves + SEM (2)

Friday, January 22nd, 2010

GoogleAngel2_thumb3Back on the job. On re-read, I have the feeling that I might have been too optimistic yesterday. Sure, the style of Google’s announcement betrayed personal involvement, and once at the negotiation table it is to be expected that a more businesslike atmosphere will prevail. But even if G shuts up, it is not sure that the CCP will let them get away with it. Depending on who they have at the table, the outcome will be anything between the two extremes we have considered.

But let’s leave our bipolar guesswork aside for a while, so we can concentrate on a more interesting issue. Namely, that it’s great that Google.cn is going to disappear, and that whatever happens to the rest of the Gs, the Chinese internet will be a better place when Google.cn is gone. Let’s start with some crude survey work:

Baidu, Google.cn or Google.com?

I improvised a little survey today in the office, where I asked three of my young Shanghai colleagues which search engines they like to use. Interestingly, the answers were very similar, and all included some form of the following statements:

  • Baidu.com is better for local information and Chinese culture.
  • Google.cn we use sometimes for international information.
  • Google.com? Nah, that’s for foreigners.

These results are surprising, because as we saw yesterday, Google.com and Google.cn are exactly the same engine.  It doesn’t make any sense to search on Google.cn, where anything as innocent as 胡锦涛 (HuJintao) is obviously SEM manipulated. For the first experiment of the day we can see how, using this slightly conflictive term, results start to differ between G.com and G.cn. Try the links, see where there’s a Wikipedia article missing?

But the best of all is the answer given by the sample colleagues when I insist on why they use Google.cn: Oh well, the browsers here  direct you to Google.cn by default. That is probably the main reason why G.cn is ranked 3rd on Alexa for China, while G.com is only ranked 6th.

Hey, wait a second. Are you telling me that all it takes to get an identical, non SEM-ed Google Search in China is to type a “.com”, and 300 million netizens haven’t noticed in the last 4 years? Well, yeah. Kind of. Let me introduce you to:

The Chinese censorship and its peculiar victims

This is one of the most misunderstood aspects of Chinese censorship in the West. I realized this with the crazy Wang post, the one that was linked in an article 3 days straight on the Most Read list of the New York Times. I got lots of hits, and also lots of mail from creative Americans proposing ideas to help “free the Chinese” from the claws of the GFW.

But listen, the sad reality is, the CCP’s systems of censorship are so effective not because they are diabolically sophisticated, but because… because the Chinese netizens can’t give a damn if they are being censored by their government or not.

You don’t believe me? Then perhaps you have a better theory to explain why nobody uses the widely available, free web proxies to surf the internet. Or why the majority of Chinese netizens still use Google.cn when they have an identical search engine that is not manipulated on Google.com

Shocking, right? But not so much. The truth is that, in spite of popular funny memes and the occasional juvenile rant, the majority of Chinese who are rich enough to use the internet are happy with the status quo. They do find it mildly annoying to be treated like children by the CCP, but as long as the bills are paid, they don’t think so much of it.

And this is also why, if someone wants to create a device against the GFW, the user activated systems like proxies or Tor are not effective, because people simply don’t use them. The idea of a Server Side Proxy, or the Unblockable Host that would unblock a site WITHOUT action by the end user, was discussed here, and I concluded it was not feasible.

This is also the reason why initiatives like Chrter 08 never make it in China: it is not about users trying to get access to dissident sites, it is about dissidents unable to market their ideas to a general population that is unreceptive.

Advanced SEM for Dummies (Search Engine Manipulation)

The most amusing thing in the Google crisis is all the commentators crying about the loss of Google.cn and its negative consequences for the freedom of the Chinese. In fact, I maintain that Google.cn is the most evil product to ever have existed in the Chinese internet, and the World will be a better place without it.

That is because, unlike the Chinese official sites that practice censorship, what the search engines do is manipulation. Why? Because Google.cn is not a content site in itself, it is a gateway to the internet. When people type in a keyword into the search field, they are actually trusting it to return a fair picture of what is on the net.

When you type a “sensitive” term and G.cn removes all the results except the People’s Daily and Xinhua, Google’s responsibility is double: not only it supports those often objectible views on the first page, but it also implicitly states that it is the ONLY opinion existing in the World.

And the worse is, the Chinese who believed that would be right to do so, because Google’s well known principles clearly specify their commitment to give all the information available  in a democratic way. The little warning message that is displayed on Google.cn SEM searches is meant to avoid this situation, but it is tiny and often placed right at the bottom of the page, so most Chinese users just ignore it.

In the case of Google.cn, SEM is not about “good” or “evil”. It is about breaking the very principles that give a sense to the Google company, and it is understandable that Google has never been comfortable with it.

TEST TRANSLATION GOOGLE.COM GOOGLE.CN
Neutral Word Shoe

Shanghai Pudong

Normal Results

Normal Results

Normal Results

Normal Results

Sensitive
Term
Hu Jing Tao

TNM massaccre

Normal Results

Normal Results

SEM  Results

SEM  Results

RC trigger string chinayouren-free.com/eng

Fallunggong

RC Block

RC Block

Normal Results

SEM Results

All tests in Chinese, English spelling is on purpose. The anomaly in the chinayouren string proves that in some rare cases G.cn does give better results that G.com, as SEM does not apply to petty disharmony. Click to continue »

Instructions to deal with the GFW

Wednesday, July 8th, 2009

I have written a lot recently about the Great Firewall of China (GFW). I had my site blocked for two weeks and this inspired some frustrated posts until eventually I worked my way through the Wall. The good news is I learnt a lot in the process, and now I can write some tips to help others with the same problem. Anyone who has a website hosted outside China can use these instructions to try to keep it accessible here. Here is the index, follow the links for details.

Prevention – Try to stay out of trouble

From the beginning when you set up your website, there are a series of measures you can take to reduce the probabilities of getting blocked and/or making your life easier if this happens. If you follow these points hopefully you will never need get to the next Section.

  • Be careful with what you publish. >>>
  • Try to avoid writing GFW keywords. >>>
  • Choose where you want to be hosted. >>>
  • Choose a good, flexible hosting service. >>>
  • Host your blog/site on a subdomain. >>>

Action – When trouble is at your door

Then one day you realize that your Chinese readership has fallen to zero, and you wonder why you can’t open your website from China. If this happens to you, these are the simple steps to follow:

  • Make sure it is really the GFW. >>>
  • Check if there is an IP block. >>>
  • Find out if the target is really you. >>>
  • Check if there is an URL block. >>>
  • Move to a new IP address. >>>
  • Change your URL and Redirect. >>>
  • Check that you don’t have links. >>>
  • Try to eliminate the keywords. >>>
  • Take it easy, and send feedback. >>>

Notes and Disclaimers

  • Don’t forget to read the party of the first part >>>

Click to continue »

Firefox 3.5 Finally

Thursday, July 2nd, 2009

imagesIt was about time Mozilla issued their new revision. Ever since Firefox emerged as the big challenger of Explorer many of us switched to this swift browser with the unlimited add-ons.

As time passed, we grew so used to all the fox capabilities that it became normal for an internet browser to perform the most various functions: Firefox was my Chinese dictionary, my anti-GFW proxy, my image editor, my wikipedia link, my financial consultant, my bookmark and my fluffy nail brush. It was Jesus, and it was perfect.

And then, suddenly,the fox got old. The add-ons started to weight on its worn out bones, and one day we found ourselves waiting 10 seconds for the browser to open. Never again. But what could we do, we were addicted to the add-ons, and way too proud to fall back on Explorer.

For months (years) on end there was no solution forthcoming, and no amount of reinstalling helped to solve the issues. Then the shrewd guys at Google jumped at the chance. They launched the new Chrome, a browser that actually does nothing at all except - guess what - browse at the speed of light. The first day I downloaded it “just to give it a try”, and after that test flight I never opened the Fox again. Instead, I got myself a dictionary.

So now the Fox is back. Its 3.5 iteration feels certainly much faster than the previous, and it supports, if not all, at least the most important of my add-ons. The influence of Chrome is very obvious, with new functions like the “private” surfing, one-click bookmarking and that little button on the right hand side of the tabs that comes so handy to open a new one.

As for the speed, I seriously doubt it can beat Chrome (look at the announcement, Mozilla compares it with Explorer, but avoids mentioning Google). But as long as it is reasonably fast and it continues to brush my nails, I think it deserves another chance. Let’s see how it goes.

I am back with the fox. I hope it rocks.

A little Study of the Internet Censorship in China

Wednesday, January 14th, 2009

Last Sunday I did a post on internet censorship in China where I mixed in various different ideas and I’m afraid the final result regarding Search Engine Censorship didn’t come out as clear as I would have liked. I think it is an important subject, so here are the complete results:

We will be looking at Google.cn, Google.com and Baidu.com, and we will try in each of them 3 different kind of search terms.

A- Chrter 08: In all its combinations, which are 08宪章 and 零八宪章
B- Political Terms: Tiananmen incidents (天安门六四事件), FLG.
C- Vulgar words: Sex. I will employ the “blog job” and the “chicken bar”.

It is understood that in all cases the search terms are in Simplified Chinese. The browser is Firefox 3.0.5. and the connection is a normal home DSL by China Telecom. The possible results are:

  • Free Search - Results look consistent and realistic, like the ones obtained in the West.
  • Reset Connection (RC) - This can only be seen in Mainland China. The result is an image like the one below and the search engine cannot open anymore for a while (I estimate 30 seconds). RC is not directly done by the Search Engine. Wikipedia internal search also gives RCs for B Terms.
  • Forbidden Message (FM)  - This is the forbidden Message that, with slight variations, is the same as shown below. It says something in the lines of: “Some results are not displayed according to the local laws, regulations and policies”.
  • Manipulated Results (MR)- This is the case where the results are obviously manipulated, for example in the search of 天安门六四事件 (Tiananmen incident) on Baidu, where all the results are official newspapers such as People’s Daily, etc. Sometimes it can also carry on top of the page a FM.

Google.com
A -Free Search.   (But click some individual results gives RC).
B- Reset Connection
C- Manipulated Results.

Google.cn
A- Forbidden Message and (sometimes *) Manipulated Results
B- Reset Connection.
C- Forbidden Message. When used “” gives Manipulated Results.

Baidu.com
A- Manipulated Results. When used “” gives Forbidden Message.
B- FM and Manipulated results.
C-FM and Manipulated Results.

Conclusions

1- The results are somewhat erratic and it is difficult to see a pattern: it all looks like a series of patches on top of each other rather than a systematic implementation. Also, things change in time, as in *, where the Manipulated Result I saw Sunday cannot be seen anymore.

2- Baidu has a different system from Google: it has no Reset Connections. This is very advantageous for Baidu and I understand it is unfair competition, as a RC is one of the worst experiences while surfing.

3- This might be due to Google’s own preference server location: the involvement of the Search Engines in the RC is unclear no direct involvement (even Wikipedia has RCs!) whereas Manipulated Results obviously requires their action, and can more easily attract attention from Advocacy Groups. Of course, in the case of sexual terms (C), this is not a problem as the Manipulated Results can just be called “Safe Search”.

4- The Chrter 08 has different treatment than other political terms, but it might just be because it was banned urgently and suddenly, so it is only a quick fix added to existing structure. It does not provoke RC in any case. It looks like they have decided to leave it alone on Google.com to avoid attention from Western advocacy groups, but in exchange Google has had to give up Google.cn and apply the infamous “porn block” to it which is active censorship by SE. Why the FM and not RC? Who knows, I am guessing perhaps RC is more complicated to implement.

5- In any case, and however negative, I understand it is always better to show FM than Manipulated Results, because the former is openly admitting censorship, whereas the latter is a lie and a distortion of reality. Forbidden Message does increase transparency, yet does not justify involvement in political censorship. From this perspective, Google is closer to the truth than Baidu. Baidu seems indeed a more active participant in the government’s information control schemes, and Chinese users of Baidu are clearly the most exposed to Search Engine brainwash.

UPDATE: Following corrections by international expert Nart Villeneuve below: I have introduced a few changes of my own (in blue). In any case, this post is just a very basic review of the SE Censorship system from the perspective of a normal user. If you really want to understand how the GFW works, you should read proper research papers like this one, or this one.

.

IMAGES:

1- FORBIDDEN MESSAGE (FM)

2- RESET CONNECTION (RC)

NOTE: If someone is interested in this or has some more information to share please put it in comments. Unfortunately my time is very limited so I only ran 2 or 3 terms for each of the classes A, B and C above. There might be things I overlooked and I would be grateful if you can point them out.