Page 1 of 2
REGEX pattern in serendipity_getUriArguments()
Posted: Mon Apr 16, 2012 4:16 pm
by gregman
Hi there,
not sure if it's a bug or a feature...
serendipity_getUriArguments() relies on "a-z" in it's search pattern. This causes problems on searchterms containing (german) umlauts while using mod_rewrite on search pages. Changing "a-z" to "\w" and adding pcre modifier "u" helped me to get rid off this. Maybe the fix should find its way in the next release?
EDIT: Just discovered that \w matches umlauts only in php 5.3.4 and later (
http://stackoverflow.com/questions/8915 ... 8-modifier). In order to get the umlauts matched in php versions prior to 5.3.4 it should be "\p{L}" with modifier "u"
http://stackoverflow.com/questions/2687 ... characters.
Regards
Greg
Besides: Could it be, that the functions_permalinks.inc.php has an ANSI coding? This is what my editor told me.
Re: REGEX pattern in serendipity_getUriArguments()
Posted: Mon Apr 16, 2012 6:17 pm
by garvinhicking
Hi!
Serendipity_ArchiveURL replaces all umlauts wie "ue" etc., so there should not occur Umlauts at all in the URL. Where do you get those? Or did you patch your serendipity_archiveURL function maybe?
Umlauts in URLs might work, but they are not officially supported or encouraged in our default codebase.
Besides: Could it be, that the functions_permalinks.inc.php has an ANSI coding? This is what my editor told me.
Could well be, there should only be ISO characters in that file...
Regards,
Garvin
Re: REGEX pattern in serendipity_getUriArguments()
Posted: Mon Apr 16, 2012 8:40 pm
by gregman
Hi Garvin,
I'm using mod_rewrite on my search pages so the url structure looks like "baseURL/searchpath/searchterm/" plus "Pxx.html" when browsing through the searchresult pages (xx means the page number). For some reason mod_rewrite erases the urlencoding of the searchterm, so that I get umlauts in the prev|next urls.
Besides this, referring to the speaking urls someone could "guess" a searchurl by typing "baseURL/searchpath/mysearchtermwithumlaut" and I would like to see these urls to work as well.
Regards
Greg
Re: REGEX pattern in serendipity_getUriArguments()
Posted: Tue Apr 17, 2012 8:05 am
by garvinhicking
Hi!
Phew, okay. I'm not having that much insight in the code of that specific area, so I can't really offer help at this point, I'm sorry. :-/
If your patch works properly, I'd happily include it to our master branch; can you provide the exact patch?
Regards,
Garvin
Re: REGEX pattern in serendipity_getUriArguments()
Posted: Tue Apr 17, 2012 2:54 pm
by gregman
Sure,
its line 788:
Code: Select all
preg_match('/^'. preg_quote($serendipity['serendipityHTTPPath'], '/') . '(' . preg_quote($serendipity['indexFile'], '/') . '\?\/)?(' . ($wildcard ? '.+' : '[!;,_a-z0-9\-*\/%\+]+') . ')/i', $uri, $_res);
changed to
Code: Select all
preg_match('/^'. preg_quote($serendipity['serendipityHTTPPath'], '/') . '(' . preg_quote($serendipity['indexFile'], '/') . '\?\/)?(' . ($wildcard ? '.+' : '[!;,_\p{L}0-9\-*\/%\+]+') . ')/u', $uri, $_res);
Regards
Greg
EDIT: I'm not aware about the system requirements of s9y. Relying on the PCRE modifier "u" requires 4.2.3 or later.
http://www.php.net/manual/en/reference. ... ifiers.php
Re: REGEX pattern in serendipity_getUriArguments()
Posted: Tue Apr 17, 2012 3:39 pm
by Timbalu
May I ask how you got mod_rewrite work there?
Even with mod_rewrite set to yes I have
/index.php?serendipity[action]=search&serendipity[searchTerm]=wäre&serendipity[searchButton]=Los!
Re: REGEX pattern in serendipity_getUriArguments()
Posted: Tue Apr 17, 2012 4:27 pm
by gregman
I'm not sure if I got your question right. If you have mod_rewrite enabled the prev/next links in the footer of each search result page looks like described above.
Regards
Greg
Re: REGEX pattern in serendipity_getUriArguments()
Posted: Tue Apr 17, 2012 4:46 pm
by Timbalu
oh right... sorry, but as Garvin said they should get converted correct by core.
I do remember a thread we had concerning this - maybe you just need to modify your template?!
http://board.s9y.org/viewtopic.php?f=10 ... p=10419792
Re: REGEX pattern in serendipity_getUriArguments()
Posted: Tue Apr 17, 2012 5:11 pm
by gregman
Sure, the urls are correctly displayed (e.g.: /Suche/K%C3%BCndigung/P2.html), but my Apache kicks the urlencoding. I could get over this, if I would urlencode all uris with umlauts twice (
http://blog.perplexedlabs.com/2008/03/2 ... haracters/), but I think its better to fix serendipity_getUriArguments(). Not least because someone could type in some searchterm containing umaluts directly into the browsers address bar and it would be easy to get this work.
Greg
Re: REGEX pattern in serendipity_getUriArguments()
Posted: Tue Apr 17, 2012 5:35 pm
by Timbalu
So this is something of your apache, isn't it?
If I place a manual GET like
/index.php?serendipity[action]=search&serendipity[fullentry]=1&serendipity[searchTerm]=für*&serendipity[searchButton]=>
I get
http://www.example.com/serendipity/sear ... r*/P6.html which is encoded right (f%C3%BCr*) in all follow-up pages.
Re: REGEX pattern in serendipity_getUriArguments()
Posted: Tue Apr 17, 2012 6:03 pm
by gregman
I haven't figured it out in detail. It may have something to do with "allow encoded slashes". As I didn't change the standard config of my hoster, it surley is something that could "happen" to other users as well.
Greg
EDIT: In your case you did a boolean search which behaves quite different. Try it with "würde"; in this case, it's not working!
NEXT EDIT: Also it has to be a rewritten uri. So what do you get if you try
http://www.example.com/serendipity/search/würde ?
Re: REGEX pattern in serendipity_getUriArguments()
Posted: Tue Apr 17, 2012 6:38 pm
by Timbalu
/serendipity/search/über
gives
http://www.example.com/serendipity/sear ... er/P2.html (%C3%BCber)
Everything ok.
Did you read the
http://board.s9y.org/viewtopic.php?f=10 ... #p10419979
to change pagination to something like $string|replace:'%s':$index.page.number ?
Re: REGEX pattern in serendipity_getUriArguments()
Posted: Thu Apr 19, 2012 8:30 pm
by gregman
In my template the urls are untasted. I just add a "Pxx.html" to the current url.
This is really weird. Could you try one more thing on your blog for me, please? In order to have a pretty looking url displayed when a user performs a search I fetch the call and make a 301 redirect to the nice url like this in the templates confic.inc.php
Code: Select all
if ($serendipity['GET']['searchTerm'] && $serendipity['GET']['searchButton']) {
header("Location: " . $serendipity['baseURL'] . "Suche/" . urlencode($serendipity['GET']['searchTerm']) . "/", true, 301);
exit;
}
You can see, that I explicitly urlencode the specific part of the url, which could contain an umlaut. Anyhow the searchTerm breaks at an umlaut when serendipity_getUriArguments() only relies on "a-z".
Regards
Greg
Re: REGEX pattern in serendipity_getUriArguments()
Posted: Fri Apr 20, 2012 9:29 am
by Timbalu
Add a <b>view=</b>{$view} to your index.tpl.
Does this show 'search' or 'Suche' or '404' when performing a search?
Did you change the permalink 'search' in the backend configuration?
How does your htaccess RewriteRule rule look like?
For me your snip does not work for searching as my Permalink is 'search', not Suche.
Re: REGEX pattern in serendipity_getUriArguments()
Posted: Fri Apr 20, 2012 5:00 pm
by gregman
Timbalu wrote:For me your snip does not work for searching as my Permalink is 'search', not Suche.
Of course! I thought it would be self-evident to adapt the permalink structure to your own config. I don't have a "clean" blog running right now, so I can't test the default behavior of s9y on my server configuration on my own.