garbled characters in templates

Found a bug? Tell us!!
Post Reply
deminy
Regular
Posts: 28
Joined: Mon Oct 10, 2005 6:17 am
Contact:

garbled characters in templates

Post by deminy »

This bug exists for more than 1 year, and never get fixed.

I didn't report this bug before because this bug exists in templates only, not in the main source code. Meanwhile, this bug exsits in many different themes, which might be a "huge" work for developers/contributors to fix it.

In last several months, some Chinese users asked me how to fix it, and I replied them in my guestbook. After reading Garvin's post "Serendipity 1.1 release cycle", I finally decided to report this bug.

Well, the bug is simple:

In template file "templates/..../entries.tpl", The following line generates garbled characters (for East Asian languages) on web pages:

Code: Select all

alert('{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:htmlall}');
To fix it, just change it to:

Code: Select all

alert('{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:html}');
The difference between the two output is shown below:

Image

Image
carl_galloway
Regular
Posts: 1331
Joined: Sun Dec 04, 2005 5:43 pm
Location: Andalucia, Spain
Contact:

Post by carl_galloway »

I will happily update all of my templates to include your solution if one of the developers can confirm this won't cause problems elsewhere. Thanks for letting us know this is a problem for East Asian languages.
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

I can't claim it won't cause *any* problems, but it looks pretty good. The only difference is whether ALL html entities get escaped, or only &"'<>*.

Looks like one of the other HTML entities (like what?) is getting used as part of a Chinese UTF-8 character that Smarty doesn't recognize.

deminy, could you try instead:

Code: Select all

alert('{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:htmlall:UTF-8}');
Of course, substitute the encoding your blog actually uses; you can find a list of supported encodings at http://us3.php.net/htmlentities if you're interested.

And naturally, if someone can actually find how to substitute the blog encoding there, it would just foolproof the whole thing.
Judebert
---
Website | Wishlist | PayPal
deminy
Regular
Posts: 28
Joined: Mon Oct 10, 2005 6:17 am
Contact:

Post by deminy »

Hi, Garvin,

Your suggestion is great. The following Smarty variable/modifier will print proper (Chinese) characters on the web page:
{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:"htmlall":"UTF-8"}
But, in this template file (entries.tpl), since the "escape" Smarty modifier is used in an "a href" HTML tag, we can not use quotation marks here again. Without quotation marks, the "escape" modifier won't work properly (a fatal PHP error will be created).
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Post by garvinhicking »

Hi Guys!

(Deminy, note that the UTF-8 hint was from the awesome Judebert, not me *g*)

I'm a bit confused to what the better solution is. I've never used the second "UTF-8" parameter to escaping. Might that make trouble in "native" charsets, when people use ISO or even koi8r or other charsets?

So currently I'm thinking that replacing 'htmlall' with 'html' could do the job better for all users? As long as it escapes quotation marks, we should be fine.

Deminy, I sadly don'T understand what you mean with this:
But, in this template file (entries.tpl), since the "escape" Smarty modifier is used in an "a href" HTML tag, we can not use quotation marks here again. Without quotation marks, the "escape" modifier won't work properly (a fatal PHP error will be created).
The goal is that the JS-call will not use ", but instead & quot;...?

Best regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
deminy
Regular
Posts: 28
Joined: Mon Oct 10, 2005 6:17 am
Contact:

Post by deminy »

I think there are several solutions for this problem, but none of them can completely solve the problem.

1. For multi-language support, officially use UTF-8 format only: All language packages should be written in UTF-8 format. As we can see, most PHP functions could handle UTF-8 strings correctly.

People can also use other character sets (without too much problems), but this is not officially recommended.

2. When using function "htmlentities" to translate a string, convert the string to UTF-8 first, and then call function "htmlentities" to translate the UTF-8 string.

The problem for this solution is, we might not be able to translate the UTF-8 string back to the original character encoding.

A similar idea was mentioned by Cameron on php.net. He provided a solution shown how to simulate function "htmlentities" for multi-byte strings without causing any problem.


3. Use

{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:"html"}

instead of

{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:"htmlall"}

(without specifying the 2nd parameter for "escape")

Here, parameter "html" is better than parameter "htmlall" because "html" uses function htmlspecialchars, while "htmlall" uses function htmlentities. As I can see, function "htmlspecialchars" is much safer when handling multi-byte strings.

This solution is not perfect, but it might be the best current available solution.

4. A more proper way to use the modifier "escape" is

{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:"html":$_charset}

, but not

{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:"html"}

This could cause some serious problems, since some character sets are not supported by the modifier "escape" (which uses function “htmlspecialchars” here).
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Post by garvinhicking »

Hi!
1. For multi-language support, officially use UTF-8 format only: All language packages should be written in UTF-8 format. As we can see, most PHP functions could handle UTF-8 strings correctly.
This is not an option for us, we definitely want to preserve the ISO-functionality, because existing blogs are not easily updatable to UTF-8.
2. When using function "htmlentities" to translate a string, convert the string to UTF-8 first, and then call function "htmlentities" to translate the UTF-8 string.

The problem for this solution is, we might not be able to translate the UTF-8 string back to the original character encoding.
Exactly, once the string is UTF-8 it would be hard to reconvert it. Plus, its performance impact would be huge!
3. Use

{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:"html"}

instead of

{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:"htmlall"}

(without specifying the 2nd parameter for "escape")

Here, parameter "html" is better than parameter "htmlall" because "html" uses function htmlspecialchars, while "htmlall" uses function htmlentities. As I can see, function "htmlspecialchars" is much safer when handling multi-byte strings.

This solution is not perfect, but it might be the best current available solution.
Okay, I think I now understand. I'm also much in favor of this option.

If no one disagrees, I'd patch all internal templates to use the 'html' modifier then?

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
carl_galloway
Regular
Posts: 1331
Joined: Sun Dec 04, 2005 5:43 pm
Location: Andalucia, Spain
Contact:

Post by carl_galloway »

So should all templates now start to use

Code: Select all

alert('{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:html}');
If so, Garvin do you want template designers to update their zipfiles?
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Post by garvinhicking »

Hi Carl!

That's okay, I will just search+replace all occurences in both bundled as well as all spartacus files. Only if people like you also offer a download on their servers, it would be nice to get them in sync :)

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
carl_galloway
Regular
Posts: 1331
Joined: Sun Dec 04, 2005 5:43 pm
Location: Andalucia, Spain
Contact:

Post by carl_galloway »

ok I'll go and update my template zipfiles over the next few days, I wonder if it would be worthwhile making an announcement in the themes forum in case other designers aren't following this thread?

Carl
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Post by garvinhicking »

Hi Carl!

Just committed the changes. If you want to make an announcement,please go ahead!

Best regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
Post Reply