Unicode in php

I previously read Spolsky's article on character encoding and also this from diving into python 3 . I know php is getting Unicode at some point, but I am having a hard time understanding why this is so important.

If php-CLI is used, ok, it makes sense. However, in the world of the web server, it is not for the browser to take that integer and turn it into a character (based on character encoding).

What am I not getting?

+1


a source to share


4 answers


Well, for something, you need to somehow generate the lines displayed by the browser :-)



0


a source


PHP "supports" UTF8, look at the mbstring 1 extension . Most of the problem comes from PHP developers who do not use mb * functions when working with UTF8 data.

UTF8 characters often have more than one character, so you need to use functions that evaluate this fact as mb_strpos 2 , not strpos 3 .



It works great if you get UTF8 from browser -> insert into database -> return it -> display it to user. If you are doing something more closely related to UTF8 data (or even some large text processing), you should probably consider an alternative language.

+4


a source


PHP string functions often treat strings as sequences of 8-byte characters. I had all sorts of problems with Chinese text flowing through string functions. substr()

for example can strip a multibyte character in half, which causes all sorts of problems for XML parsers.

+1


a source


There is an awesome FAQ section on Unicode and on the web here . See if he answers some of your questions.

0


a source







All Articles