[2.0] Use Language Library?

Mark threads with "[2.0]" for discussions about features in the longer-term future, "[1.6]" is for short-term. This is not the place for general discussions or plugin or template requests. Only features that are approved to happen by the core team should be listed here for better structuring.
Locked
User avatar
onli
Regular
Posts: 2231
Joined: Tue Sep 09, 2008 10:04 pm
Contact:

[2.0] Use Language Library?

Post by onli » Sat Jul 06, 2013 1:15 am

I'd like to discuss about the internationalization, which technical system 2.0 should use. At the moment, I see three real options, maybe someone knows another better one?

I will shortly describe the systems available and what speaks in my eyes for and against them:

1. The current system (constants)
The current system is simple: Loading every possibly needed string as a constant which is afterwards globally available.

Pro:
  • Easy to understand
  • Plain text, easy to edit
  • Tools in place, and code wouldn't need to get changed
  • As fast as a constant-lookup in PHP
Contra:
  • The code loading that is a bit cumbersome, for example the include/lang.inc.php is included twice in serendipity_config to load the language
  • A mix of custom charsets and UTF-8 (in my eyes, it's time for a UTF-8 solution)
  • On every load of serendipity, all those constants are fetched from the file and loaded into memory. According to my test, this costs about of 1,25 MB (checked with memory_get_usage(true) directly for and after the second include - if someone knows a better way to measure that, please speak up). Depending on the perspective, that is either a lot (for mere language constants) or negligible (compared to ram available on modern systems)
  • As fast as a constant-lookup in PHP
2. php-i18n (https://github.com/Philipp15b/php-i18n)
That looks like a pretty simple system. Translations are stored in .ini-files, it includes a system for automatic language detection (which can be overwritten) and caching is integrated as well. Glancing over the code, it seems to work by translating the ini via parse_ini_file into an associative array, caching that step, and creating an object containing the translated strings.

Pro:
  • Plain text, easy to edit
  • Having the translations enclosed in its own array prettier than having them as constants
  • Could be a bit faster
Contra:
  • I don't think it would save any memory compared to the current system
  • Lots of changes necessary
  • Speed difference probably minimal
3. Gettext
Gettext is the professional and right way to do something like this. That said, I personally don't like that system very much, I try not to let affect that my judgement too much. It translates .po files into binary mo-files, which could be the fastest option if done right.

Pro
  • Probably fast
  • Proven, widely used system
Contra
  • compilation step necessary
  • Lots of changes necessary
  • Is the gettext-extension available on every target system (guess it should be, and there is a fallback: https://launchpad.net/php-gettext/)
4. intl
A new PHP-API for internationalization, probably also somehow useable for our purposes, but I understand exactly nil of the documentation. But if someone has an understanding of that system, maybe it is an option as well.

User avatar
garvinhicking
Core Developer
Posts: 30020
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: [2.0] Use Language Library?

Post by garvinhicking » Mon Jul 08, 2013 9:28 am

Hi!

Yeah, this thing has bugging me as well since the early days. I'm not yet sure that a different tool like php-i18n has that much advantage over the current, established scheme of things, so that it's worth the hassle of adjusting to it. Also, I find gettext quite ugly to work with.

Having said that, I'm also not standing in the way of exchanging the system if others favor a new one...

Note that UTF-8 does AFAIK not work on some of our native translations (that would require UTF-16), so we might probably not be able to use a UTF-8 only solution. Also, ideally all current translations should be migrated when using a new tool...

REgards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/

User avatar
Timbalu
Regular
Posts: 4598
Joined: Sun May 02, 2004 3:04 pm

Re: [2.0] Use Language Library?

Post by Timbalu » Mon Jul 08, 2013 10:23 am

Not the Gettext way, please!!!
Whats left? Stick to the current system for the moment.
Regards,
Ian

Serendipity Styx Edition and additional_plugins @ https://ophian.github.io/ @ https://github.com/ophian

User avatar
onli
Regular
Posts: 2231
Joined: Tue Sep 09, 2008 10:04 pm
Contact:

Re: [2.0] Use Language Library?

Post by onli » Mon Jul 08, 2013 10:55 am

Hi
garvinhicking wrote:Note that UTF-8 does AFAIK not work on some of our native translations (that would require UTF-16), so we might probably not be able to use a UTF-8 only solution.
I tried to check that. UTF-16 and UTF-8 have the same space of chars they may represent, just in a different way (see http://stackoverflow.com/questions/3864 ... in-my-html). I don't think that is really an issue, or am I missing something?
Timbalu wrote:Wats left? Stick to the current system for the moment.
Not so fast please. There may very well be other systems I didn't find yet.

But sure, with the current set of options, the current system a bit reworked (UTF-8 only, at least in the files, better initialize code) is my favorite.

User avatar
garvinhicking
Core Developer
Posts: 30020
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: [2.0] Use Language Library?

Post by garvinhicking » Tue Jul 09, 2013 12:19 pm

Hi!

I only remember that people who maintain the native charset translations always told me that native charsets are still very common in their native countries, and UTF-8 often would not fit for them. That is why I always liked that s9y offers a way to use a native or UTF-8 charset, and work properly.

For people who upgrade it will also be very painful to migrate from a native charset to a UTF-8 one, depending on which SQL DB type is used, transformation cannot be done easily. So I think forcing UTF-8 would be a bad thing for users.

So I vote against mandatory UTF-8, but of course I'm happy to support anything that has advantages, and also support improving the initialization.

Changing from constants to other means will also mean rewriting any existing plugin and template, and that would be a problem to run alongside s9y < 2 .0 installations. The more I write about it, the more problematic I see this change.

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/

User avatar
onli
Regular
Posts: 2231
Joined: Tue Sep 09, 2008 10:04 pm
Contact:

Re: [2.0] Use Language Library?

Post by onli » Tue Jul 09, 2013 12:56 pm

Hi Garvin
I tossed that around in my head a while now, especially the upgrade-issue.

Even if we used solely UTF-8 for the translation files, that wouldn't mean that it wouldn't be possible to convert those strings into a native charset when outputted. A very straightforward way would be to, on initialization, take the file with the constants, convert it into a native charset (that could be chached) and read it in only then. I think this would be an almost minor change and would aready reduce files in the core and the possibility of errors if one of the lang-files get checked in with the wrong charset.
Changing from constants to other means will also mean rewriting any existing plugin and template
I don't think that this would really be necessary. Sure, it would be nice to use one uniform system. But I see no technical reason why a plugin shouldn't work flawlessly with the old system regardless of the system the core uses - apart from the function used by the plugin to read in the file, which would be the perfect place to maintain bc-compatibility.

Edit: Oh, I see one know: Missing constants used from the core. Hm…
garvinhicking wrote:I only remember that people who maintain the native charset translations always told me that native charsets are still very common in their native countries, and UTF-8 often would not fit for them.
And that sure was the case, then. In modern browsers and OSs, it is not an issue anymore. It might be possible that in some countries, old technology is still widely used. If that is the case and if those are the users we want to reach - and let us try to evaluate the current situation - we could keep the option to serve files with a native charset, like described above.

User avatar
garvinhicking
Core Developer
Posts: 30020
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: [2.0] Use Language Library?

Post by garvinhicking » Wed Jul 10, 2013 12:06 pm

Hi!

Would you use iconv for that? It sounds worth to try it out with a proof-of-concept implementation for testing?
And that sure was the case, then. In modern browsers and OSs, it is not an issue anymore. It might be possible that in some countries, old technology is still widely used. If that is the case and if those are the users we want to reach - and let us try to evaluate the current situation - we could keep the option to serve files with a native charset, like described above.
We should definitely ask some chinese, arabic, romanic users to see. I wouldn't want to impose my european/wester view of things onto those people, we should take care of them as well...

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/

User avatar
onli
Regular
Posts: 2231
Joined: Tue Sep 09, 2008 10:04 pm
Contact:

Re: [2.0] Use Language Library?

Post by onli » Wed Jul 10, 2013 1:04 pm

garvinhicking wrote:Would you use iconv for that? It sounds worth to try it out with a proof-of-concept implementation for testing?
Yes, and agreed. I'll wait a bit, maybe someone presents a perfect alternative.

garvinhicking wrote:We should definitely ask some chinese, arabic, romanic users to see. I wouldn't want to impose my european/wester view of things onto those people, we should take care of them as well...
Lazybadger? Or if we find a reputable source claiming that it is no issue anymore. But I didn't find it yet.

Locked