Correct alphabet sorting. A piece of spaghetti code

Random stuff about serendipity. Discussion, Questions, Paraphernalia.
Post Reply
LazyBadger
Regular
Posts: 176
Joined: Mon Aug 25, 2008 12:25 pm
Location: Russia
Contact:

Correct alphabet sorting. A piece of spaghetti code

Post by LazyBadger »

Gute Nacht, liebe Genossen! Ich bitte um etwas Hilfe von niemandem in dieser Idee interessiert.

Mission of code
Provide a way of sorting strings in natural alphabet order for any non-English alphabets. Because I was crying bitterly on the results of the standard PHP sort - they are alogical (from user's POV), unnatural and uncomfortable: f.e such order of tags just drive me nuts

Code: Select all

    [50] => АИ
    [49] => Анонс
    [92] => История
    [47] => Крестный Батька
    [48] => ПА
    [51] => Россия
    [53] => СССР
    [52] => Сэй Алек
    [54] => Тарковский
    [55] => Яндекс
    [57] => агрегатор
    [58] => алармизм
    [56] => аудио
    [61] => байки
    [62] => биология
    [63] => блоги
    [64] => боевик
    [60] => бред
    [59] => бусидо
    [65] => важно
    [66] => гарнитуры
    [67] => генетика
    [68] => детектив
    [69] => журналамеры
    [70] => злое
    [72] => идиотизмы
    [71] => история России
    [74] => котовое
    [73] => кулинарное
    [77] => масскульт
    [81] => меломанское
    [79] => миниатюра
    [78] => мифы
    [80] => мониторы
    [75] => музыка
    [76] => мысли
    [82] => на злобу дня
    [83] => наблюдения
    [84] => новость
    [89] => парадоксы
    [90] => погодное
    [91] => попаданцы
    [88] => программизмы
    [87] => профессиональное
    [85] => псевдолитература
    [86] => публицистика
    [35] => рейтинги
    [36] => религия
    [37] => реплики
    [34] => русопятство
    [33] => русский язык
    [20] => сериалы
    [16] => стандарты
    [17] => стиль разработки
    [18] => столик
    [19] => столик-трансформер
    [27] => телефоны
    [26] => типографика
    [25] => триллер
    [29] => фантастика
    [30] => филология
    [31] => фото
    [28] => фразы
    [32] => цитаты
    [24] => шутки
    [22] => экономика
    [21] => эссе
    [23] => я.ру
but this is ideal

Code: Select all

    [50] => АИ
    [49] => Анонс
    [57] => агрегатор
    [58] => алармизм
    [56] => аудио
    [61] => байки
    [62] => биология
    [63] => блоги
    [64] => боевик
    [60] => бред
    [59] => бусидо
    [65] => важно
    [66] => гарнитуры
    [67] => генетика
    [68] => детектив
    [69] => журналамеры
    [70] => злое
    [92] => История
    [72] => идиотизмы
    [71] => история России
    [47] => Крестный Батька
    [74] => котовое
    [73] => кулинарное
    [77] => масскульт
    [81] => меломанское
    [79] => миниатюра
    [78] => мифы
    [80] => мониторы
    [75] => музыка
    [76] => мысли
    [83] => наблюдения
    [82] => на злобу дня
    [84] => новость
    [48] => ПА
    [89] => парадоксы
    [90] => погодное
    [91] => попаданцы
    [88] => программизмы
    [87] => профессиональное
    [85] => псевдолитература
    [86] => публицистика
    [51] => Россия
    [35] => рейтинги
    [36] => религия
    [37] => реплики
    [34] => русопятство
    [33] => русский язык
    [53] => СССР
    [52] => Сэй Алек
    [20] => сериалы
    [16] => стандарты
    [17] => стиль разработки
    [18] => столик
    [19] => столик-трансформер
    [54] => Тарковский
    [27] => телефоны
    [26] => типографика
    [25] => триллер
    [29] => фантастика
    [30] => филология
    [31] => фото
    [28] => фразы
    [32] => цитаты
    [24] => шутки
    [22] => экономика
    [21] => эссе
    [55] => Яндекс
    [23] => я.ру
Fields of application
Always, where sort in order or mothertongue is preferred:
tags-list, category-list in block for sorting "Category", etc

Implementation
UDF for u(a|k)sort()

Implementation details
lang/UTF-8/serendipity_lang_*.inc.php (ru in example)

Code: Select all

...
/* full alphabet - digits, english, local in both cases */
$alphabet = '0123456789AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZzАаБб...';
...
include/alphasort.inc.php (or add-on to lang.inc.php?) significant portion

Code: Select all

 class utf_8_alphabet 
{ 
   if (isset($GLOBALS['alphabet'])) {
       static $order = $GLOBALS['alphabet'];
   } else {
       static $order = '0123456789AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz';
   } 
   // everything else is sorted at the end 
   static function cmp($a, $b) { 
after "include alphasort.inc.php" alphabetical sorting f array can be obtained with one-liner fix: add uasort() or uksort() (sort by value|key)
before array output
Last edited by LazyBadger on Thu May 05, 2011 2:48 am, edited 1 time in total.
Quis custodiet ipsos custodes?
LazyBadger
Regular
Posts: 176
Joined: Mon Aug 25, 2008 12:25 pm
Location: Russia
Contact:

Re: Correct alphabet sorting. A piece of spaghetti code

Post by LazyBadger »

And here my troubles began
In attachment I added two files
- standalone test of utf_8_alphabet::cmp, there initial array was filled with data from test-blog
- dirty hack of serendipity_event_freetag.php from test-blog,there I added class and debug print into code

In both cases source array is (sorry for russian keys)

Code: Select all

array( 
    'Assembla' => 1,
    'сервисы' => 1,
    'русский язык' => 3,
    's9y' => 6,
    'Serendipity' => 3,
    'Twitter' => 2,
    'анонсы' => 1,
    'блог' => 5,
    'обновления' => 5,
    'перевод' => 2 
 );
(dumped from blog data)

But for alphasort.php I got after uksort()+print_r()

Code: Select all

Array
(
    [Assembla] => 1
    [Serendipity] => 3
    [s9y] => 6
    [Twitter] => 2
    [анонсы] => 1
    [блог] => 5
    [обновления] => 5
    [перевод] => 2
    [русский язык] => 3
    [сервисы] => 1
)
(good)
and for serendipity_event_freetag.php with

Code: Select all

@@ -934,6 +983,9 @@
                         <div id="backend_freetag_list" style="margin: 5px; border: 1px dotted #000; padding: 5px; font-size: 9px;">
 <?php
                             $lastletter = '';
+                            preprint($taglist);
+                            uksort($taglist, 'utf_8_alphabet::cmp');
+                            preprint($taglist);
                             foreach ($taglist as $tag => $count) {
                                 if (function_exists('mb_strtoupper')) {
                                     $upc = mb_strtoupper(mb_substr($tag, 0, 1, LANG_CHARSET), LANG_CHARSET); 
on second preprint()

Code: Select all

Array
(
    [Assembla] => 1
    [Serendipity] => 3
    [s9y] => 6
    [Twitter] => 2
    [сервисы] => 1
    [обновления] => 5
    [русский язык] => 3
    [блог] => 5
    [анонсы] => 1
    [перевод] => 2
)
(wrong and bad sort)

I'm lost totally
Attachments
serendipity_event_freetag.zip
Hacked event_freetag with sorting and debug print
(17.24 KiB) Downloaded 273 times
alphasort.zip
Class checker
(952 Bytes) Downloaded 258 times
Quis custodiet ipsos custodes?
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Correct alphabet sorting. A piece of spaghetti code

Post by garvinhicking »

Hi!

Doesn't natsort() help?

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
LazyBadger
Regular
Posts: 176
Joined: Mon Aug 25, 2008 12:25 pm
Location: Russia
Contact:

Re: Correct alphabet sorting. A piece of spaghetti code

Post by LazyBadger »

Natsort(), even in form natcasesort() is JUST SUXX. It know nothing about alphabet order beside numbers and US-ASCII.
BTW, I fixed my code (one string moved one string lower), we can just integrate it now, if you want and can review it
Quis custodiet ipsos custodes?
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Correct alphabet sorting. A piece of spaghetti code

Post by garvinhicking »

Hi!

Sure, is the attachment in your posting above the fixed version already?

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
LazyBadger
Regular
Posts: 176
Joined: Mon Aug 25, 2008 12:25 pm
Location: Russia
Contact:

Re: Correct alphabet sorting. A piece of spaghetti code

Post by LazyBadger »

Full file from above - no, patch here - yes, it has.
I didn't remove also preprint() around uksort() (which have to be done in release version) and init $order inside class without GLOBALS by hardcoded russian alphabet
Attachments
event_freetag-diff.zip
Diff with all addons and debug
(1.28 KiB) Downloaded 268 times
Quis custodiet ipsos custodes?
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Correct alphabet sorting. A piece of spaghetti code

Post by garvinhicking »

Hi!

Great, thanks for your effort! I believe this is a great starting step.

Hm, I think before committing this it would be an idea to create its own s9y plugin for that, so that we can sort using that algorithm on other instances as well.

A new event hook "sort" could be created, so that in the freetag plugin this is used:

Code: Select all

serendipity_plugin_api::hook_event('sort', $taglist);
and then a serendpity_event_sorter plugin like yours could use that sort.

This has not only the advantage of a reusable plugin in more instances, but also it will not affect current users who do not need a different sorting algorithm (that takes more processing power).

And the new plugin could be maintained to sort by other strings as well, like japanese?

Also, the utf_8_alphabet class would need to be adaptet to only use mb_str if those function exists, because there are PHP installations without mbstring, and this would break the whole freetag plugin for those people...

Best regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
LazyBadger
Regular
Posts: 176
Joined: Mon Aug 25, 2008 12:25 pm
Location: Russia
Contact:

Re: Correct alphabet sorting. A piece of spaghetti code

Post by LazyBadger »

garvinhicking wrote: Hm, I think before committing this it would be an idea to create its own s9y plugin for that, so that we can sort using that algorithm on other instances as well.

A new event hook "sort" could be created
It's far away from the area of my competence and even uncompetence, in such situation I completely trust your vision of right and wrong solutions
garvinhicking wrote:This has not only the advantage of a reusable plugin in more instances, but also it will not affect current users who do not need a different sorting algorithm (that takes more processing power).
Agree, it will be "The Right Way" (tm)
garvinhicking wrote:And the new plugin could be maintained to sort by other strings as well, like japanese?
AFAIS, any alphabet, which can be presented as UTF8-string, can use this "natural sorter"
garvinhicking wrote:Also, the utf_8_alphabet class would need to be adaptet to only use mb_str if those function exists, because there are PHP installations without mbstring, and this would break the whole freetag plugin for those people...
For PHP without mbstring we can:
- do nothing (best way, I think)
- use replacements from Andreas Gohr UTF8 helper functions (attached) (LGPL 2.1, usable in Serendipity?)
Attachments
utf8.zip
UTF8 helper functions
(27.18 KiB) Downloaded 254 times
Quis custodiet ipsos custodes?
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Correct alphabet sorting. A piece of spaghetti code

Post by garvinhicking »

Hi!

Ive just committed the new "sort" event hook to the freetag plugin, and committed serendipity_event_sort which contains your code. I wasn't able to test this, so please go ahead and see if the plugin does the trick for you?

I cannot really read your special characters, so I don't know if the $order array now contains the proper characters?!

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
LazyBadger
Regular
Posts: 176
Joined: Mon Aug 25, 2008 12:25 pm
Location: Russia
Contact:

Re: Correct alphabet sorting. A piece of spaghetti code

Post by LazyBadger »

garvinhicking wrote: Ive just committed the new "sort" event hook to the freetag plugin, and committed serendipity_event_sort which contains your code.
Still can't see these commits in SPARTACUS, tried to read online event_sort. This snippet confuses me somewhat.

Code: Select all

   function utf8cmp($a, $b) { 
	        static $order = null;
	        static $char2order = null;
	        
	        if ($order === null) {
	            // Kyrillic. More languages to come?
	            $order =
Because if wiil be always true and everybody will get only one alphabet...
Or have I misunderstood something?
I planned to have the alphabet in the language file as additional variable and if this variable is missing, use fallback-alphabet in pure US-ASCII only (non-mentioned letters will be sorted after English). Retransmission of draft

Code: Select all

   if (isset($GLOBALS['alphabet'])) {
       static $order = $GLOBALS['alphabet'];
   } else {
       static $order = '0123456789AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz';
   } 
This way extending to new languages can be easy task - just adding alphabet to already existing translations
Quis custodiet ipsos custodes?
LazyBadger
Regular
Posts: 176
Joined: Mon Aug 25, 2008 12:25 pm
Location: Russia
Contact:

Re: Correct alphabet sorting. A piece of spaghetti code

Post by LazyBadger »

Updated freetag and installed event_sort by hand. All correct (in sorting) in backend
* new entry - tags div
* manage tags - all tags
and in frontend tag cloud
* sort order - tag name

New sorting not used in "related tags" output (must these tags be sorted?)

Will be happy to see category-list with sort-order Category" sorted by new algo also
Quis custodiet ipsos custodes?
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Correct alphabet sorting. A piece of spaghetti code

Post by garvinhicking »

Hi!

Thanks for the headsup. The spartacus sync wasn't properly working due to a PHP parse error.

About the $order === null -- currently only one alphabet is used, but I suppose in the future when more alphabets get put there, we need to decide which one to put into $order.

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
Post Reply