Hi!
The easiest thing to debug what issues one has with database charsets is to start as low level as possible.
Write a "utf8test.php" script:
Code: Select all
<?php
$c = mysqli_connect('localhost', 'root', 'supersecretpassword');
mysqli_select_db($c, 'serendipity');
header('Content-Type: text/plain; charset=UTF-8');
echo "Output without setting a charset:";
$r = mysqli_query($c, "SELECT title FROM serendipity_entries LIMIT 15");
while($row = mysqli_fetch_array($r, MYSQLI_ASSOC)) {
echo "* " . $row['title'] . "\n";
}
echo "Output with setting a charset:";
mysqli_query($c, "SET NAMES utf8");
$r = mysqli_query($c, "SELECT title FROM serendipity_entries LIMIT 15");
while($row = mysqli_fetch_array($r, MYSQLI_ASSOC)) {
echo "* " . $row['title'] . "\n";
}
Adapt that code to match your database credentials, database name and serendipity table prefix. Then run that in your browser.
In an ideal world, where the mysql default charset is UTF-8, you should see all proper UTF-8 chars (given that any of your 15 entries contain an umlaut in the title; raise the LIMIT if that doesn't happen).
In a bad world you might get wrong UTF-8 output in the first variant, but proper output in the last variant. If that happens, your mysql runs ISO-8859-1 as default charset.
In a very bad world, both will yield double UTF-8 encodings. The reason for that will most likely be the following: When you installed Serendipity, you might have used a MySQL server that had iso-8859-1 as default charset. The tables were created with ISO-8859-1 collactions. But then you configured s9y to use UTF-8, and all the data in the table got saved as UTF-8.
That will probably have worked out just fine, simply because you saved raw UTF-8 and MySQL simply saved that in the database because it did not perform any custom charset transplantations. But at the point where MySQL got to use UTF-8 as default, you might have gotten "?" characters at an output; this would have gotten fixed with a "SET NAMES utf8" command, which s9y enables with the "Set dbNames" option you also found.
Now, what probably happened is that some automatic upgrade script tried to be smarter than you, and see that you had tables with latin1/iso-8859-1 collactions. It thought "Hey, that's bad, we want UTF-8". It then used MySQL characters transformations to put latin1 to UTF-8 and "fix" the collaction of the table.
Where it failed, is that already UTF-8 was stored in the table! So it got double encoded, and is now stored like that.
The real way to perform a UTF-8 migration in that case is to use the table in latin1 state, save the SQL dump as a file, DO NOT CHANGE the content of the file, and simply add a "SET NAMES utf8" (if not existing already) in the first line of the dump, then change all latin1 to "utf8_general_ci" collations and then reimport the dump. This way, only the metadata of a table gets updated and the actual content remain proper UTF-8 as is. I think this should be documented in our FAQ for UTF-8 migration on our documentation site.
In your case, when you don't have the old table structure with latin1 collations, you will need to resort to scripts that try to re-encoding double UTF-8 to "single" UTF-8. This can basically work like described in
https://stackoverflow.com/questions/114 ... tf-8-table and needs to be done for every field, in every table.
The other way I sometimes resort to is a simple PHP script like this, that operates on a SQL dump you have created with the "--skip-set-charset" option:
Code: Select all
<?php
$dump = file_get_contents('dump.sql');
$fp = fopen('dump.fixed.sql, 'wb');
fwrite($fp, "SET NAMES utf8;\n" . utf8_decode($dump));
fclose($fp);
This would remove "one" of the double UTF-8 encoding, but in some edge cases it can really fail. Best way is really to operate with the dump in first place before an automatic upgrade screwed with it.
Best regards,
Garvin