Saturday, 15 May 2010

php - UTF-8 all the way through -



php - UTF-8 all the way through -

i'm setting new server, , want back upwards utf-8 in web application. have tried in past on existing servers , seem end having fall iso-8859-1.

where need set encoding/charsets? i'm aware need configure apache, mysql , php - there standard checklist can follow, or perhaps troubleshoot mismatches occur?

this new linux server, running mysql 5, php 5 , apache 2.

data storage:

specify utf8mb4 character set on tables , text columns in database. makes mysql physically store , retrieve values encoded natively in utf-8. note mysql implicitly utilize utf8mb4 encoding if utf8mb4_* collation specified (without explicit character set).

in older versions of mysql (< 5.5.3), you'll unfortunately forced utilize utf8, supports subset of unicode characters. wish kidding.

data access:

in application code (e.g. php), in whatever db access method use, you'll need set connection charset utf8mb4. way, mysql no conversion native utf-8 when hands info off application , vice versa.

some drivers provide own mechanism configuring connection character set, both updates own internal state , informs mysql of encoding used on connection—this preferred approach. in php:

if you're using pdo abstraction layer php ≥ 5.3.6, can specify charset in dsn:

$dbh = new pdo('mysql:charset=utf8mb4');

if you're using mysqli, can phone call set_charset():

$mysqli->set_charset('utf8mb4'); // object oriented style mysqli_set_charset($link, 'utf8mb4'); // procedural style

if you're stuck plain mysql happen running php ≥ 5.2.3, can phone call mysql_set_charset.

if driver not provide own mechanism setting connection character set, may have issue query tell mysql how application expects info on connection encoded: set names 'utf8mb4'.

the same consideration regarding utf8mb4/utf8 applies above.

output:

if application transmits text other systems, need informed of character encoding. web applications, browser must informed of encoding in info sent (through http response headers or html metadata).

in php, can utilize default_charset php.ini option, or manually issue content-type mime header yourself, more work has same effect.

input:

unfortunately, should verify every received string beingness valid utf-8 before seek store or utilize anywhere. php's mb_check_encoding() trick, have utilize religiously. there's no way around this, malicious clients can submit info in whatever encoding want, , haven't found trick php reliably.

from reading of current html spec, next sub-bullets not necessary or valid anymore modern html. understanding browsers work , submit info in character set specified document. however, if you're targeting older versions of html (xhtml, html4, etc.), these points may still useful:

for html before html5 only: want info sent browsers in utf-8. unfortunately, if go the way reliably add together accept-charset attribute <form> tags: <form ... accept-charset="utf-8">. for html before html5 only: note w3c html spec says clients "should" default sending forms server in whatever charset server served, apparently recommendation, hence need beingness explicit on every single <form> tag.

other code considerations:

obviously enough, files you'll serving (php, html, javascript, etc.) should encoded in valid utf-8.

you need create sure every time process utf-8 string, safely. is, unfortunately, hard part. you'll want create extensive utilize of php's mbstring extension.

php's built-in string operations not default utf-8 safe. there things can safely normal php string operations (like concatenation), things should utilize equivalent mbstring function.

to know you're doing (read: not mess up), need know utf-8 , how works on lowest possible level. check out of links utf8.com resources larn need know.

php mysql linux apache

No comments:

Post a Comment