REGEX pattern in serendipity_getUriArguments()

Found a bug? Tell us!!
Timbalu
Regular
Posts: 4598
Joined: Sun May 02, 2004 3:04 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by Timbalu »

Well, sure it is doing well - same as without - here (and better without the ending slash).
But I still don't really get what your problem actual is. Could you please describe what is making problems with the pagination without that snipped in detail? And, are we talking about standard or bulletproof pagination? What about the htaccess?
Regards,
Ian

Serendipity Styx Edition and additional_plugins @ https://ophian.github.io/ @ https://github.com/ophian
gregman
Regular
Posts: 91
Joined: Wed Aug 15, 2007 9:32 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by gregman »

Well, it's that easy: On my server I discovered a problem with serendipity handling searchterms with umlauts. In detail: Searchterms were broken just before an umlaut. As I went to examine whats going on I found, that

1) Apache does not keep the urlencoding when using mod_rwrite. Affected rewrite rule is

Code: Select all

RewriteRule ^{PAT_SEARCH} {indexFile}?url=/{PATH_SEARCH}/$1 [L,QSA]
As mentioned in one of my former post this is a known issue of Apache. One solution to get this fixed is to urlencode the affected url TWICE!!! The disadvantage of that solution is, that I cripple urls where's no need.

2) As I combed through the code I found that serendipity_getUriArguments() relies on a REGEX pattern which causes the cut in umlaut url. So In my opinion this is the right point to fix this behavior. After that everything works fine. So my suggestion was to get this fix into the core as some other users out there may have the same problem with apache mod rewrite und urlencodinded urls.

Greg
Timbalu
Regular
Posts: 4598
Joined: Sun May 02, 2004 3:04 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by Timbalu »

This is much clearer. Sorry for all these detours.
Is that a common distributed Apache version?
I still can't imagine why Apache should lose urlencoding on its way... on some systems....

After some searching.... still not really knowing which one serves your problem...

Did you try to add a [NE] (NoEscape) flag?

Code: Select all

RewriteRule ^search/(.*) index.php?url=/search/$1 [L,NE,QSA] 
Is that the problem?

Or maybe you need to put a

Code: Select all

RewriteMap esc int:escape
into the main httpd.conf and/or use

Code: Select all

/${esc:$1}
in the htaccess (see http://rdfabout.com/demo/census/htaccess.txt).
Regards,
Ian

Serendipity Styx Edition and additional_plugins @ https://ophian.github.io/ @ https://github.com/ophian
gregman
Regular
Posts: 91
Joined: Wed Aug 15, 2007 9:32 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by gregman »

Timbalu wrote: I still can't imagine why Apache should lose urlencoding on its way... on some systems....
Me neither untill I experienced it on my own system after a server upgrade. I did try the NE flag with no change. Because of the upgrade there are many other construction areas I have to deal with. So I didn't have the time to figure it out in detail. There is a php solution mentioned as a workaround on https://issues.apache.org/bugzilla/show ... i?id=34602. Anyway according to the linked thread the bug still seems to be present.

Greg
Timbalu
Regular
Posts: 4598
Joined: Sun May 02, 2004 3:04 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by Timbalu »

Well yes, I have read that and according to post 16, the 'Bug' gets solved with the second solution of my last post. If you follow the link, the one describes nice what happens on these systems presenting this 'bug'.
You patch seems do do well, but it is an Apache issue in the last, isn't it?!
That is why I asked if that is a common distributed Apache version, which should give us a hint to be prepared for this more often via your patch or not.
Regards,
Ian

Serendipity Styx Edition and additional_plugins @ https://ophian.github.io/ @ https://github.com/ophian
gregman
Regular
Posts: 91
Joined: Wed Aug 15, 2007 9:32 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by gregman »

Ok, I see. My Apache was pre installed by my Hosting provider, but have no doubt that it's the commonly spread version on ubuntu 10.04 systems.

Greg
Timbalu
Regular
Posts: 4598
Joined: Sun May 02, 2004 3:04 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by Timbalu »

I wish we could have some better info, why this happens on some (rare) Apaches.
@Garvin, what shall we do with it now?
Regards,
Ian

Serendipity Styx Edition and additional_plugins @ https://ophian.github.io/ @ https://github.com/ophian
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: REGEX pattern in serendipity_getUriArguments()

Post by garvinhicking »

Timbalu wrote:I wish we could have some better info, why this happens on some (rare) Apaches.
@Garvin, what shall we do with it now?
If it fixes things, we can implement gregmans patch proposed on the first page of this thread; I don't see much harm in it, and it's easier to fix there than to do some regexp .htaccess mumbo jumbo that is harder to maintain than our PHP code....?

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
gregman
Regular
Posts: 91
Joined: Wed Aug 15, 2007 9:32 pm

Re: REGEX pattern in serendipity_getUriArguments()

Post by gregman »

Hi,

after some time I come back to this issue... mainly because the provided patch does not work for phrases and some other special characters, which are wisely encoded by s9y but my apache aparently decodes when performing a rewrite rule. Here you can see that this bug/behavior is resolved/changed in Apache 2.5 but I'm sure there are others like my who are stuck to an earlier version of apache.
Timbalu wrote:You patch seems do do well, but it is an Apache issue in the last, isn't it?!
Therefore I decided to examine the given workarounds in the above link and came to a solution which may be better than the former regex-patch.

Aparently {THE_REQUEST} is not affected by the bug/behavior, so its possible to put it in a rewrite condition before the rewrite rule of the search pattern an reference it inside the rewrite rule with %1 like this

Code: Select all

RewriteCond %{THE_REQUEST} ^GET\ {PREFIX}{PAT_SEARCH}\ HTTP/\d\.\d
RewriteRule ^{PAT_SEARCH} {indexFile}?url=/{PATH_SEARCH}/%1 [L,QSA]
Maybe you can check the fix?

Regards
Greg
Post Reply