add some more chars to $from $to char conversion

Discussion corner for Developers of Serendipity.
Post Reply
User avatar
Timbalu
Regular
Posts: 4598
Joined: Sun May 02, 2004 3:04 pm

add some more chars to $from $to char conversion

Post by Timbalu » Wed Oct 07, 2015 10:11 am

serendipity_makeFilename() uses the $from $to str_replace() non ascii characters for filenames.
I saw that we do not support the main scandinavian Danish/Norwegian lang chars yet, like
æ, Æ, å, Å, ø, Ø (we do have the small å though).
Shouldn't we add and convert them to ae, Ae, a, A, oe, Oe? (Would that be replaced right?)

Edit:
And Ä Ö Ü convert to AE OE UE. This is not what we want, do we?! :)
Regards,
Ian

Serendipity Styx Edition and additional_plugins @ https://ophian.github.io/ @ https://github.com/ophian

User avatar
garvinhicking
Core Developer
Posts: 30020
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: add some more chars to $from $to char conversion

Post by garvinhicking » Wed Oct 07, 2015 3:39 pm

Hi!

I'd say they might be uncommon for a german or english blog. Additing very many characters always means some more work to do on every permalink URL generation call. I'd say, better to add those characters to the language they are specific to?
And Ä Ö Ü convert to AE OE UE. This is not what we want, do we?! :)
"It depends". Somethink like "ÄRGERLICH!" would then be "AeRGERLICH!" which looks ugly. To check whether to use AE or Ae would mean to inspect the follow up character, that's very costy to lookup.

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/

User avatar
Timbalu
Regular
Posts: 4598
Joined: Sun May 02, 2004 3:04 pm

Re: add some more chars to $from $to char conversion

Post by Timbalu » Wed Oct 07, 2015 5:09 pm

That is Ärgerlich indeed! ;-)
But since this is serendipity_makeFilename(), I assume normal Ärgerlich will be more common than ÄRGERLICH (and dito strtoupper for other starting Umlaut words).
garvinhicking wrote:I'd say, better to add those characters to the language they are specific to?
If you use norway or danish language they both use these serendipity_makeFilename() conversion arrays too. There is no extra convert array in the lang set. Or did I not get you right? (And I only announced the most common ones, not the ð or Þ.)
Regards,
Ian

Serendipity Styx Edition and additional_plugins @ https://ophian.github.io/ @ https://github.com/ophian

User avatar
garvinhicking
Core Developer
Posts: 30020
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: add some more chars to $from $to char conversion

Post by garvinhicking » Thu Oct 08, 2015 1:14 pm

Hi!

If we changed that it would mean all URLs of people would suddenly change; upper/lowercasing in URLs can be different, so that might lead to duplicate indexed content in bots. I don't really think we should change this AE/Ae handling... but if the majority thinks otherwise, I'm open.
If you use norway or danish language they both use these serendipity_makeFilename() conversion arrays too. There is no extra convert array in the lang set. Or did I not get you right? (And I only announced the most common ones, not the ð or Þ.)
You can set the i18n_blabla strings inside a language file, it overrides the global default. See russian lange files, some of them do this...
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/

User avatar
Timbalu
Regular
Posts: 4598
Joined: Sun May 02, 2004 3:04 pm

Re: add some more chars to $from $to char conversion

Post by Timbalu » Thu Oct 08, 2015 2:44 pm

My experiences with dropping media items into the MediaLibrary are, that you nearly never want to have full upper cased name items there.
If you want to change this bad conversion name later on, you can not just rename the Testimage_AErgerlich.jpg to Testimage_Aergerlich.jpg, since that isn't a real change for the system and it will tell you that this name already exists. You have to move twice to eg XTestimage_Aergerlich.jpg and then back to Testimage_Aergerlich.jpg. ;-)

About the scandinavian main umlauts, I still think adding these 5 more would not matter in terms of expensives, since we already support umlauts for much smaller countries.
Regards,
Ian

Serendipity Styx Edition and additional_plugins @ https://ophian.github.io/ @ https://github.com/ophian

User avatar
onli
Regular
Posts: 2254
Joined: Tue Sep 09, 2008 10:04 pm
Contact:

Re: add some more chars to $from $to char conversion

Post by onli » Thu Oct 08, 2015 4:28 pm

Do we really still need to do this at all? Which still used filesystems are not UTF-8 ready?

Post Reply