Serendipity-Buch

Handbuch für Serendipity bestellen

Das offizielle, umfassende Serendipity-Handbuch für Einsteiger und Profis ist nun im Handel und kann online bei Amazon oder OpenSourcePress, oder auch bei jedem Buchhändler, bestellt werden!

Forum-Information

Before posting about errors, make sure that the answer cannot already be found in our FAQ or by searching this forum!
Posting is restricted to registered users (registering is free and simple!) due to recent spam attacks. When having trouble with this board, contact garvin(-at)s9y(-dot)org.

Board index Bugs [bug] garbled characters in some plugins

Found a bug? Tell us!!
User avatar
deminy
Regular
 
Posts: 28
Joined: Mon Oct 10, 2005 6:17 am

Postby deminy » Fri Jun 23, 2006 12:23 am

Although s9y supports many languages including east Asian languages, there are still some minor bugs on Asian languages support. When using s9y to build a blog of a multibyte language (such as Chinese, Japanese, etc), sometime you could find garbled characters were shown up in sidebar.

At least, garbled characters occur in sidebar when using the internal plugin "serendipity_archives_plugin" and the sidebar plugin "serendipity_plugin_comments".

There are two possible reasons why the garbled characters occur: 1. the web server doesn't support mb_string module; 2. PHP functions like "wordwrap" etc don't support for multibyte strings.

Here are the solution I found to solve the problem. The solution works for s9y v0.8.x to v1.0.

1. For internal plugin "serendipity_archives_plugin":

In file "./include/lang.inc.php", check line 63 (in function serendipity_mb()). The original code is:

Code: Select all
return mb_strtoupper(mb_substr($args[1], 0, 1)) . mb_substr($args[1], 1);


Modify it to:

Code: Select all
return mb_strtoupper(mb_substr($args[1], 0, 1, mb_detect_encoding($args[1])), mb_detect_encoding($args[1])) . mb_substr($args[1], 1, mb_strlen($args[1], mb_detect_encoding($args[1])), mb_detect_encoding($args[1]));


2. For sidebar plugin "serendipity_plugin_comments"

In file "...../serendipity_plugin_comments.php, from line 153 to line 202 (in function generate_content(&$title)). Modify the following:

2.1. replace "$serendipity['lang'] == "ja"" to

Code: Select all
($serendipity['lang'] == "ja" || $serendipity['lang'] == "cn" || $serendipity['lang'] == "zh" || $serendipity['lang'] == "ko" || $serendipity['lang'] == "tw" || $serendipity['lang'] == "tn")


2.2 For those multibyte functions like mb_strimwidth() and mb_strlen(), add the last parameter for encoding selection.

For exmaple:

Original source code:
Code: Select all
mb_strlen( $comment)


After modification:
Code: Select all
mb_strlen( $comment, mb_detect_encoding($comment))


I wrote a Chinese blog discussing this problem:
http://www.deminy.net/blog/archives/4214-y.html

User avatar
deminy
Regular
 
Posts: 28
Joined: Mon Oct 10, 2005 6:17 am

Postby deminy » Fri Jun 23, 2006 3:06 am

I forgot to tell one thing.

S9y do have a "mb_internal_encoding()" statement in file "include/lang.inc.php". But it seems to have no effects on multi-byte functions which are called later in s9y.

For example, as said above, in file "...../serendipity_plugin_comments.php", when you want to call a PHP function "mb_strlen", you might thought that since you have set a value for the internal encoding, you could write the code in this way:
Code: Select all
mb_strlen( $comment)


Here you suppose to use the default encoding (which had been defined when calling function "mb_internal_encoding") when calling function "mb_strlen".

BUT, actually, to avoid garbled characters (for east Asian languages), you MUST write it down like this:
Code: Select all
mb_strlen( $comment, mb_detect_encoding($comment))


The above changes won't affect the performance too much, especially when u r using single-byte encoding languages, like English.

User avatar
garvinhicking
Core Developer
 
Posts: 26675
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany

Postby garvinhicking » Fri Jun 23, 2006 10:48 am

Hi!

Thanks a lot for helping to improve that situation! I committed your changes to SVN trunk.

However, for me to understand things: Do you know why mb_internal_encoding() does not work? I would really think that setting it to LANG_CHARSET (which should be 'UTF-8' in your case) should do the trick?

I believe you that if you say it doesn't work, but isn't this an unsatisfying situation? :)

Best regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/

User avatar
deminy
Regular
 
Posts: 28
Joined: Mon Oct 10, 2005 6:17 am

Postby deminy » Fri Jun 23, 2006 4:02 pm

A detailed debug and test could cost me a lot time which I can not afford now.

But I did do a few simple test, but still couldn't tell the exact reason. Setting it to LANG_CHARSET might work for very simple code, but not for s9y.

Let me make it clear.

Following PHP's definition, the above sample code,

Code: Select all
mb_strlen( $comment)


SHOULD have the same effect as the following code :

Code: Select all
mb_strlen( $comment, mb_detect_encoding($comment))


But actually, in s9y, the first would cause garbled characters while the second won't.

The possible reason could be: either a bug in PHP's multibyte functions, or other unknown bug/settings in s9y or in my test. (I am guessing this is not a bug of s9y itself, but I don't think I did something wrong in the test.)

It might be an unknown bug in PHP (but seems not). For more information, you can read comments for function mb_strtolower in PHP.net:
http://ca.php.net/mb_strtolower

User avatar
garvinhicking
Core Developer
 
Posts: 26675
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany

Postby garvinhicking » Sat Jun 24, 2006 12:28 am

Hi!

Oh, okay. I can understand that. I'll try to run some tests, but mainly my chinese is a bit rusty. ;))

Are you using a UTF-8 environment? With which serendipity language file?

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/

User avatar
deminy
Regular
 
Posts: 28
Joined: Mon Oct 10, 2005 6:17 am

Postby deminy » Sat Jun 24, 2006 6:41 am

I am using Simplified Chinese ( utf-8 ).

Based on my experience, I think there are no too much difference no matter you choose Simplified Chinese ( utf-8 ) or Simplified Chinese ( gb2312 ).

These two encoding charsets use exactly the same language files in s9y.

Good luck



Return to Bugs

Who is online

Users browsing this forum: MSN [Bot] and 0 guests