UTF-8 character encoding issues
Posted: Sat 29. Oct 2005, 10:21
I want to use PHPWCMS for several websites in different languages. To make things as easy and compatible as possible I want to use UTF-8 as my encoding. So off I go...
I have a local setup of PHP 4.3.3 and MySQL 4.1.14. All standard install. I install PHPWCMS 1.2.5. Choose UTF-8 as character encoding. Easy.
I make a small sample site and put in some text. I use the FCKeditor 2. In order to prevent the editor from writing HTMLentities I have switched those off.
I put in Russian, Thai, Czech and a whole bunch of accented characters and even a ü! And (to my surprise) EVERYTHING WORKS. Great job.
So I transfer the whole lot to a staging server. But then problems arise: most characters (80-90%) are displayed correctly but some of them are not. In all of the different languages. Seems like part of the unicode 'set' is not properly encoded.
What can that be?
The staging server setup is almost identical to local: PHP 4.3.3 and MySQL 4.1.10 The only difference is that the other server has the PHP mbstring extension switched off. That 'can' be a problem in PHPMyAdmin (I'm using 2.6.4-pl1 on both) but PHPWCMS doesn't use this function, no?
As I investigate further I find some strange things:
- When you install PHPWCMS, the SQL tables are created BEFORE you choose character encoding for the site. And no collation prefs are set. Is that the best way? Is the default collation (latin-swedish-ci) going to work well with UTF-8? I experimented with setting collation to UTF-bin and UTF-unicode-ci but no difference.
- The 'foreign' characters are encoded in the MySQL database itself, where I was expecting to see the actual characters (much easier to work with). Is that correct? On my local setup they are displayed correctly in the HTML produced and inside PHPWCMS admin so it works. But it's not ideal.
Sorry for writing such a long story but this really does my head in. If I can get it to work locally I should be able to have it work on another site. And other than 'mbstring' which I don't think is the problem I don't see what the cause is!
Any help or ideas much appreciated. Thanks!
I have a local setup of PHP 4.3.3 and MySQL 4.1.14. All standard install. I install PHPWCMS 1.2.5. Choose UTF-8 as character encoding. Easy.
I make a small sample site and put in some text. I use the FCKeditor 2. In order to prevent the editor from writing HTMLentities I have switched those off.
I put in Russian, Thai, Czech and a whole bunch of accented characters and even a ü! And (to my surprise) EVERYTHING WORKS. Great job.
So I transfer the whole lot to a staging server. But then problems arise: most characters (80-90%) are displayed correctly but some of them are not. In all of the different languages. Seems like part of the unicode 'set' is not properly encoded.
What can that be?
The staging server setup is almost identical to local: PHP 4.3.3 and MySQL 4.1.10 The only difference is that the other server has the PHP mbstring extension switched off. That 'can' be a problem in PHPMyAdmin (I'm using 2.6.4-pl1 on both) but PHPWCMS doesn't use this function, no?
As I investigate further I find some strange things:
- When you install PHPWCMS, the SQL tables are created BEFORE you choose character encoding for the site. And no collation prefs are set. Is that the best way? Is the default collation (latin-swedish-ci) going to work well with UTF-8? I experimented with setting collation to UTF-bin and UTF-unicode-ci but no difference.
- The 'foreign' characters are encoded in the MySQL database itself, where I was expecting to see the actual characters (much easier to work with). Is that correct? On my local setup they are displayed correctly in the HTML produced and inside PHPWCMS admin so it works. But it's not ideal.
Sorry for writing such a long story but this really does my head in. If I can get it to work locally I should be able to have it work on another site. And other than 'mbstring' which I don't think is the problem I don't see what the cause is!
Any help or ideas much appreciated. Thanks!