Chris Y
2006-04-20 00:16:00 UTC
I am using IE and IIS.
I am facing some problems with non-English characters and so did a test with a simple FORM with ENCTYPE='application/x-www-form-urlencoded' and a single <INPUT NAME='name'> box and and then wrote out the data that was received. My results are:
a. If I input just Chinese characters: æ¯æ³œäž, it was received and written out as: ëÃó¶«. I captured the HTTP POST stream and the data passed was: name=%C3%AB%D4%F3%B6%AB.
b. If I put another character, the copyright symbol (Unicode +U00A9), before my Chinese characters: ©æ¯æ³œäž, then the data is received correctly, ie the same as I have entered them. The data transmitted was: name=%A9%26%2327611%3B%26%2327901%3B%26%2319996%3B. I can understand that this is the same as: ©毛泽东.
I am at a loss of what is going on. Why would the browser encode it differently in the two cases? How can I force it to stick to one method (the second one)?
I am not totally familiar with Unicode. In Character Map, why does some characters have two codes, eg for æ¯, it is shown as U+6BDB (0xC3AB). What is C3AB? It is the one giving problem in my first case above.
Thanks in advance.
js
I am facing some problems with non-English characters and so did a test with a simple FORM with ENCTYPE='application/x-www-form-urlencoded' and a single <INPUT NAME='name'> box and and then wrote out the data that was received. My results are:
a. If I input just Chinese characters: æ¯æ³œäž, it was received and written out as: ëÃó¶«. I captured the HTTP POST stream and the data passed was: name=%C3%AB%D4%F3%B6%AB.
b. If I put another character, the copyright symbol (Unicode +U00A9), before my Chinese characters: ©æ¯æ³œäž, then the data is received correctly, ie the same as I have entered them. The data transmitted was: name=%A9%26%2327611%3B%26%2327901%3B%26%2319996%3B. I can understand that this is the same as: ©毛泽东.
I am at a loss of what is going on. Why would the browser encode it differently in the two cases? How can I force it to stick to one method (the second one)?
I am not totally familiar with Unicode. In Character Map, why does some characters have two codes, eg for æ¯, it is shown as U+6BDB (0xC3AB). What is C3AB? It is the one giving problem in my first case above.
Thanks in advance.
js