Page 1 of 1

add some more chars to $from $to char conversion

Posted: Wed Oct 07, 2015 10:11 am
by Timbalu
serendipity_makeFilename() uses the $from $to str_replace() non ascii characters for filenames.
I saw that we do not support the main scandinavian Danish/Norwegian lang chars yet, like
æ, Æ, å, Å, ø, Ø (we do have the small å though).
Shouldn't we add and convert them to ae, Ae, a, A, oe, Oe? (Would that be replaced right?)

Edit:
And Ä Ö Ü convert to AE OE UE. This is not what we want, do we?! :)

Re: add some more chars to $from $to char conversion

Posted: Wed Oct 07, 2015 3:39 pm
by garvinhicking
Hi!

I'd say they might be uncommon for a german or english blog. Additing very many characters always means some more work to do on every permalink URL generation call. I'd say, better to add those characters to the language they are specific to?
And Ä Ö Ü convert to AE OE UE. This is not what we want, do we?! :)
"It depends". Somethink like "ÄRGERLICH!" would then be "AeRGERLICH!" which looks ugly. To check whether to use AE or Ae would mean to inspect the follow up character, that's very costy to lookup.

Regards,
Garvin

Re: add some more chars to $from $to char conversion

Posted: Wed Oct 07, 2015 5:09 pm
by Timbalu
That is Ärgerlich indeed! ;-)
But since this is serendipity_makeFilename(), I assume normal Ärgerlich will be more common than ÄRGERLICH (and dito strtoupper for other starting Umlaut words).
garvinhicking wrote:I'd say, better to add those characters to the language they are specific to?
If you use norway or danish language they both use these serendipity_makeFilename() conversion arrays too. There is no extra convert array in the lang set. Or did I not get you right? (And I only announced the most common ones, not the ð or Þ.)

Re: add some more chars to $from $to char conversion

Posted: Thu Oct 08, 2015 1:14 pm
by garvinhicking
Hi!

If we changed that it would mean all URLs of people would suddenly change; upper/lowercasing in URLs can be different, so that might lead to duplicate indexed content in bots. I don't really think we should change this AE/Ae handling... but if the majority thinks otherwise, I'm open.
If you use norway or danish language they both use these serendipity_makeFilename() conversion arrays too. There is no extra convert array in the lang set. Or did I not get you right? (And I only announced the most common ones, not the ð or Þ.)
You can set the i18n_blabla strings inside a language file, it overrides the global default. See russian lange files, some of them do this...

Re: add some more chars to $from $to char conversion

Posted: Thu Oct 08, 2015 2:44 pm
by Timbalu
My experiences with dropping media items into the MediaLibrary are, that you nearly never want to have full upper cased name items there.
If you want to change this bad conversion name later on, you can not just rename the Testimage_AErgerlich.jpg to Testimage_Aergerlich.jpg, since that isn't a real change for the system and it will tell you that this name already exists. You have to move twice to eg XTestimage_Aergerlich.jpg and then back to Testimage_Aergerlich.jpg. ;-)

About the scandinavian main umlauts, I still think adding these 5 more would not matter in terms of expensives, since we already support umlauts for much smaller countries.

Re: add some more chars to $from $to char conversion

Posted: Thu Oct 08, 2015 4:28 pm
by onli
Do we really still need to do this at all? Which still used filesystems are not UTF-8 ready?