Mysql and coding
You need UTF-8 to the end to do smart quotes and dashes ("" -) and other non-ASCII characters reliably:
(1) Make sure the browser is sending you UTF-8 encoded characters. Do this by declaring a page that includes a UTF-8 form:
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
...
(Ignore <form accept-encoding>
what doesn't work in IE.)
(2) PHP deals with the raw bytes and doesn't care what encoding they are in, but the database does, so you have to tell which encoding the bytes from PHP come in. This is what SET NAMES
, although mysql_set_charset may be preferred.
(3) Once the correct characters reach the database, they will need to be stored in Unicode to make sure all characters can fit. Each column can have a different encoding, but you can use DEFAULT CHARACTER SET utf8
when you CREATE table
want all text columns in it to use UTF-8. You can also set the default character set for the database or the entire server to utf8
if you like.
If you already have CREATE
d tables and they don't map to UTF-8, you will have to recreate or alter the tables. You can check the current sort with SHOW FULL COLUMNS FROM sometable;
.
(4) Make sure that you are outputting HTML text from PHP using htmlspecialchars()
, and not htmlentities()
, which will mess up non-ASCII characters by default.
[As an alternative to (2) and (3), you can only use the default Latin-1 encoding for join and table storage, but at the same time put UTF-8 bytes in it. The disadvantage of this approach is that it will look wrong in relation to other tools that look at the database, and lower / upper case characters will not compare to each other in the case-insensitive manner expected.]
a source to share
I am assuming you are pasting some kind of text editor that converts "
to angular cute quote and converts yours -
to mdash, which makes it display as ?
.
While you are setting the database to accept UTF8 characters, you probably haven't set your web server / PHP to accept those characters. Try to play around with functions mbstring
, but make sure you arent using beveled quotes or dashes.
a source to share