Problems with UTF-8 encoding in PHP

The characters I get from the url like www.mydomain.com/?name=john were fine as they weren't in Russian.

If they were in Russian, I got "".

So, I added $ name = iconv ("cp1251", "utf-8", $ name); and now it works great for Russian and English characters, but screws other languages. :)))

For example, "Janis" (Latvian), which worked fine before iconv, now becomes "jDЃnis".

Any idea if there is some kind of universal encoder that will work with both Cyrillic languages ​​and won't mess up other languages?

+2


a source to share


3 answers


It really boils down to the problem of how the URL is encoded. If you click a link on a given page, the browser will use the page encoding to send the request, but if you enter the URL directly into your browser's address bar, the behavior will somehow be undefined as there is no standardized way on the encoding used (Firefox provides a radio button about:config

for using encoded urls UTF-8).

Aside from using some encoding definition, there is no way to find out the encoding used with the URL in a given request.

EDIT:

Just to back up the above, I wrote a small test script that shows the default behavior of the five major browsers (in my case - Mac OS X - Windows Vista via Parallels in the case of IE):



$p = $_GET['p'];
for ($i = 0; $i < strlen($p); $i++) {
    // this displays the binary data received via the URL in hex format
    echo dechex(ord($p[$i])) . ' ';
}

      

The call http://path/to/script.php?p=äöü

results in

  • Safari (4.0.5): c3 a4 c3 b6 c3 bc

  • Firefox (3.6.3): c3 a4 c3 b6 c3 bc

  • Google Chrome (5.0.375.38): c3 a4 c3 b6 c3 bc

  • Opera (10.10): e4 f6 fc

  • Internet Explorer (8.0.6001.18904): e4 f6 fc

Thus, it is obvious that the first three use URL encoded UTF-8, while Opera and IE use ISO-8859-1 or some of its variants. Conclusion : you cannot be sure what the encoding of the text data sent via the url.

+2


a source


Why don't you just use UTF-8 with all files and processes?



+3


a source


It looks like the problem is with the file encoding, you should always use UTF-8 without BOM as the preferred encoding for your files .php

, code editors like Intype make it easy to specify this (UTF-8 Plain).

alt text

Also, add the following code to your files before any output:

header('Content-Type: text/html; charset=utf-8');

      


You should also read "The Absolute Minimum" that every software developer should know. Absolutely, positively must know about Unicode and character sets (no excuses!) Joel Spolsky.

+1


a source







All Articles