REGEX pattern in serendipity_getUriArguments()

Found a bug? Tell us!!
gregman
Regular
Posts: 91
Joined: Wed Aug 15, 2007 9:32 pm

REGEX pattern in serendipity_getUriArguments()

Post by gregman »

Hi there,

not sure if it's a bug or a feature...

serendipity_getUriArguments() relies on "a-z" in it's search pattern. This causes problems on searchterms containing (german) umlauts while using mod_rewrite on search pages. Changing "a-z" to "\w" and adding pcre modifier "u" helped me to get rid off this. Maybe the fix should find its way in the next release?

EDIT: Just discovered that \w matches umlauts only in php 5.3.4 and later (http://stackoverflow.com/questions/8915 ... 8-modifier). In order to get the umlauts matched in php versions prior to 5.3.4 it should be "\p{L}" with modifier "u" http://stackoverflow.com/questions/2687 ... characters.

Regards
Greg

Besides: Could it be, that the functions_permalinks.inc.php has an ANSI coding? This is what my editor told me.
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: REGEX pattern in serendipity_getUriArguments()

Post by garvinhicking »

Hi!

Serendipity_ArchiveURL replaces all umlauts wie "ue" etc., so there should not occur Umlauts at all in the URL. Where do you get those? Or did you patch your serendipity_archiveURL function maybe?

Umlauts in URLs might work, but they are not officially supported or encouraged in our default codebase.
Besides: Could it be, that the functions_permalinks.inc.php has an ANSI coding? This is what my editor told me.
Could well be, there should only be ISO characters in that file...

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
gregman
Regular
Posts: 91
Joined: Wed Aug 15, 2007 9:32 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by gregman »

Hi Garvin,

I'm using mod_rewrite on my search pages so the url structure looks like "baseURL/searchpath/searchterm/" plus "Pxx.html" when browsing through the searchresult pages (xx means the page number). For some reason mod_rewrite erases the urlencoding of the searchterm, so that I get umlauts in the prev|next urls.

Besides this, referring to the speaking urls someone could "guess" a searchurl by typing "baseURL/searchpath/mysearchtermwithumlaut" and I would like to see these urls to work as well.

Regards
Greg
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: REGEX pattern in serendipity_getUriArguments()

Post by garvinhicking »

Hi!

Phew, okay. I'm not having that much insight in the code of that specific area, so I can't really offer help at this point, I'm sorry. :-/

If your patch works properly, I'd happily include it to our master branch; can you provide the exact patch?

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
gregman
Regular
Posts: 91
Joined: Wed Aug 15, 2007 9:32 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by gregman »

Sure,

its line 788:

Code: Select all

preg_match('/^'. preg_quote($serendipity['serendipityHTTPPath'], '/') . '(' . preg_quote($serendipity['indexFile'], '/') . '\?\/)?(' . ($wildcard ? '.+' : '[!;,_a-z0-9\-*\/%\+]+') . ')/i', $uri, $_res);
changed to

Code: Select all

preg_match('/^'. preg_quote($serendipity['serendipityHTTPPath'], '/') . '(' . preg_quote($serendipity['indexFile'], '/') . '\?\/)?(' . ($wildcard ? '.+' : '[!;,_\p{L}0-9\-*\/%\+]+') . ')/u', $uri, $_res);
Regards
Greg

EDIT: I'm not aware about the system requirements of s9y. Relying on the PCRE modifier "u" requires 4.2.3 or later. http://www.php.net/manual/en/reference. ... ifiers.php
Timbalu
Regular
Posts: 4598
Joined: Sun May 02, 2004 3:04 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by Timbalu »

May I ask how you got mod_rewrite work there?
Even with mod_rewrite set to yes I have
/index.php?serendipity[action]=search&serendipity[searchTerm]=wäre&serendipity[searchButton]=Los!
Regards,
Ian

Serendipity Styx Edition and additional_plugins @ https://ophian.github.io/ @ https://github.com/ophian
gregman
Regular
Posts: 91
Joined: Wed Aug 15, 2007 9:32 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by gregman »

I'm not sure if I got your question right. If you have mod_rewrite enabled the prev/next links in the footer of each search result page looks like described above.

Regards
Greg
Timbalu
Regular
Posts: 4598
Joined: Sun May 02, 2004 3:04 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by Timbalu »

oh right... sorry, but as Garvin said they should get converted correct by core.
I do remember a thread we had concerning this - maybe you just need to modify your template?!
http://board.s9y.org/viewtopic.php?f=10 ... p=10419792
Regards,
Ian

Serendipity Styx Edition and additional_plugins @ https://ophian.github.io/ @ https://github.com/ophian
gregman
Regular
Posts: 91
Joined: Wed Aug 15, 2007 9:32 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by gregman »

Sure, the urls are correctly displayed (e.g.: /Suche/K%C3%BCndigung/P2.html), but my Apache kicks the urlencoding. I could get over this, if I would urlencode all uris with umlauts twice (http://blog.perplexedlabs.com/2008/03/2 ... haracters/), but I think its better to fix serendipity_getUriArguments(). Not least because someone could type in some searchterm containing umaluts directly into the browsers address bar and it would be easy to get this work.

Greg
Timbalu
Regular
Posts: 4598
Joined: Sun May 02, 2004 3:04 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by Timbalu »

So this is something of your apache, isn't it?
If I place a manual GET like
/index.php?serendipity[action]=search&serendipity[fullentry]=1&serendipity[searchTerm]=für*&serendipity[searchButton]=>
I get http://www.example.com/serendipity/sear ... r*/P6.html which is encoded right (f%C3%BCr*) in all follow-up pages.
Regards,
Ian

Serendipity Styx Edition and additional_plugins @ https://ophian.github.io/ @ https://github.com/ophian
gregman
Regular
Posts: 91
Joined: Wed Aug 15, 2007 9:32 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by gregman »

I haven't figured it out in detail. It may have something to do with "allow encoded slashes". As I didn't change the standard config of my hoster, it surley is something that could "happen" to other users as well.

Greg

EDIT: In your case you did a boolean search which behaves quite different. Try it with "würde"; in this case, it's not working!

NEXT EDIT: Also it has to be a rewritten uri. So what do you get if you try http://www.example.com/serendipity/search/würde ?
Timbalu
Regular
Posts: 4598
Joined: Sun May 02, 2004 3:04 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by Timbalu »

/serendipity/search/über
gives
http://www.example.com/serendipity/sear ... er/P2.html (%C3%BCber)
Everything ok.

Did you read the http://board.s9y.org/viewtopic.php?f=10 ... #p10419979
to change pagination to something like $string|replace:'%s':$index.page.number ?
Regards,
Ian

Serendipity Styx Edition and additional_plugins @ https://ophian.github.io/ @ https://github.com/ophian
gregman
Regular
Posts: 91
Joined: Wed Aug 15, 2007 9:32 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by gregman »

In my template the urls are untasted. I just add a "Pxx.html" to the current url.

This is really weird. Could you try one more thing on your blog for me, please? In order to have a pretty looking url displayed when a user performs a search I fetch the call and make a 301 redirect to the nice url like this in the templates confic.inc.php

Code: Select all

if ($serendipity['GET']['searchTerm'] && $serendipity['GET']['searchButton']) {
    header("Location: " . $serendipity['baseURL'] . "Suche/" . urlencode($serendipity['GET']['searchTerm']) . "/", true, 301);
    exit;
}
You can see, that I explicitly urlencode the specific part of the url, which could contain an umlaut. Anyhow the searchTerm breaks at an umlaut when serendipity_getUriArguments() only relies on "a-z".

Regards
Greg
Timbalu
Regular
Posts: 4598
Joined: Sun May 02, 2004 3:04 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by Timbalu »

Add a <b>view=</b>{$view} to your index.tpl.
Does this show 'search' or 'Suche' or '404' when performing a search?
Did you change the permalink 'search' in the backend configuration?
How does your htaccess RewriteRule rule look like?

For me your snip does not work for searching as my Permalink is 'search', not Suche.
Regards,
Ian

Serendipity Styx Edition and additional_plugins @ https://ophian.github.io/ @ https://github.com/ophian
gregman
Regular
Posts: 91
Joined: Wed Aug 15, 2007 9:32 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by gregman »

Timbalu wrote:For me your snip does not work for searching as my Permalink is 'search', not Suche.
Of course! I thought it would be self-evident to adapt the permalink structure to your own config. I don't have a "clean" blog running right now, so I can't test the default behavior of s9y on my server configuration on my own.
Post Reply