Preventing 404's from being generated?

Random stuff about serendipity. Discussion, Questions, Paraphernalia.
Post Reply
Michael Harrison
Regular
Posts: 51
Joined: Sat Jan 28, 2006 12:50 pm

Preventing 404's from being generated?

Post by Michael Harrison »

Does anyone know of a way munge .htaccess so that rewriting happens before a 404 is generated?

It seems that a 404 is generated whenever someone asks for anything that ends up being "rewritten" such as "blog/categories/2-Tutorials" and I'd like the 404 only to be generated if s9y can't find a specified page.

The reason I'd like this is that my host has a nice feature where I can see just the last 100 errors that have been generated. Unfortunately a fair number of them are "invalid" because they end up as rewrites into the blog.
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

If your host supports mod_rewrite, just enable it in your Serendipity "Appearance and Options" configuration. It sounds like you're using Apache errorhandling, and that will result in 404s. (In fact, that's kinda the definition.)

With errorhandling, there's no way (I know of) to make your .htaccess redirect nicely.
Judebert
---
Website | Wishlist | PayPal
Michael Harrison
Regular
Posts: 51
Joined: Sat Jan 28, 2006 12:50 pm

Post by Michael Harrison »

Thanks Jude.

mod_rewrite is enabled but that $%# googlebot still somehow causes 404 when spidering perfectly valid paths into the blog.
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

So mod_rewrite is enabled in Serendipity, but GoogleBot generates 404s in your access log?

Hm. Is GoogleBot using the same domain your blog is set to? It sounds trivial, but under many hosts www.mydomain.com and mydomain.com aren't the same, causing one to generate 404s.

Have you tried copy-and-pasting the 404 URL from your log into your browser? What happens?
Judebert
---
Website | Wishlist | PayPal
Michael Harrison
Regular
Posts: 51
Joined: Sat Jan 28, 2006 12:50 pm

Post by Michael Harrison »

I'm not sure what domain they might be using as that's not contained in my logs but if I plug
/blog/categories/10-Lab-Notes
after both http://www.dragonseye.com and http://dragonseye.com it works fine.

After testing what I just wrote, I looked at my logs and found the following:

[Fri Oct 6 03:56:57 2006] [error] [client xx.xx.xx.xx] File does not exist: /home/dragon5/public_html/blog/categories/10-Lab-Notes
[Fri Oct 6 03:55:51 2006] [error] [client xx.xx.xx.xx] File does not exist: /home/dragon5/public_html/blog/categories/10-Lab-Notes

so it looks like the blog is generating 404s and then taking me to the category.
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Post by garvinhicking »

Hi!

If you get those and still see the valid pages, then mod_rewrite is not properly installed on your webserver. Please check your .htaccess file if it contains "RewriteRule" content; if it does, please contact your web-provider and ask him if mod_rewrite is properly installed.

HTH,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
Michael Harrison
Regular
Posts: 51
Joined: Sat Jan 28, 2006 12:50 pm

Post by Michael Harrison »

Thanks for your reply Garvin but it doesn't make sense to me.

If mod_rewrite isn't installed (properly or otherwise) how would the rewrite rules ever work?

In addition to the ones used by the blog I've got a number of them in my root folder that all work and don't generate 404s.

Assuming it's some small configuration problem on their end, what sorts of questions can I ask them other than "is mod_rewrite properly installed?" which I expect will just get me a "yes" answer.
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Post by garvinhicking »

Hi!

Serendipity uses mod_rewrite and enables the Apache errorhandling as a fallback. Thus, pretty URLs will work using the Apache Errorhandler - and leave that logfile entry that you do get as well!

So if you configured mod_rewrite, but get the logfile entries, it simply means that the safety fallback is being used.

BTW, Google still sees proper HTTP 200 pages; they do not get a 404 header. But the entry in the logfile can not be prevented, sadly.
Assuming it's some small configuration problem on their end, what sorts of questions can I ask them other than "is mod_rewrite properly installed?" which I expect will just get me a "yes" answer.
You could ask them that with the .htaccess file in your blog directory, mod_rewrite is not working, and if they can enable the RewriteLogLevel / RewriteLog attribute to check why your rules are not executed?

Best regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
Michael Harrison
Regular
Posts: 51
Joined: Sat Jan 28, 2006 12:50 pm

Post by Michael Harrison »

It looks like this might have been fixed by switching s9y from mod_rewrite to apache error handling and back to mod_rewrite.

When comparing the resulting rewrite blocks from my old .htaccess to the new it changed from

Code: Select all

RewriteEngine On
RewriteBase /blog/
RewriteRule ^(archives/([0-9]+)-[0-9a-z\.\_!;,\+\-]+\.html) index.php?/$1 [L,QSA]
RewriteRule ^(authors/([0-9]+)-[0-9a-z\.\_!;,\+\-]+) index.php?/$1 [L,QSA]
RewriteRule ^(feeds/categories/([0-9;]+)-[0-9a-z\.\_!;,\+\-]+\.rss) index.php?/$1 [L,QSA]
RewriteRule ^({PAT_PERMALINK_FEEDAUTHORS}) index.php?/$1 [L,QSA]
RewriteRule ^(categories/([0-9;]+)-[0-9a-z\.\_!;,\+\-]+) index.php?/$1 [L,QSA]
RewriteRule ^archives([/A-Za-z0-9]+)\.html index.php?url=/archives/$1.html [L,QSA]
RewriteRule ^([0-9]+)[_\-][0-9a-z_\-]*\.html index.php?url=$1-article.html [L,NC,QSA]
RewriteRule ^feeds/(.*) index.php?url=/feeds/$1 [L,QSA]
RewriteRule ^unsubscribe/(.*)/([0-9]+) index.php?url=/unsubscribe/$1/$2 [L,QSA]
RewriteRule ^approve/(.*)/(.*)/([0-9]+) index.php?url=approve/$1/$2/$3 [L,QSA]
RewriteRule ^delete/(.*)/(.*)/([0-9]+) index.php?url=delete/$1/$2/$3 [L,QSA]
RewriteRule ^(admin|entries)(/.+)? index.php?url=admin/ [L,QSA]
RewriteRule ^archive/? index.php?url=/archive [L,QSA]
RewriteRule ^(index|atom[0-9]*|rss|b2rss|b2rdf).(rss|rdf|rss2|xml) rss.php?file=$1&ext=$2
RewriteRule ^(plugin|plugin)/(.*) index.php?url=$1/$2 [L,QSA]
RewriteRule ^search/(.*) index.php?url=/search/$1 [L,QSA]
RewriteRule ^(serendipity\.css|serendipity_admin\.css) index.php?url=/$1 [L,QSA]
RewriteRule ^index\.(html?|php.+) index.php?url=index.html [L,QSA]
RewriteRule ^htmlarea/(.*) htmlarea/$1 [L,QSA]
RewriteRule (.*\.html?) index.php?url=/$1 [L,QSA]

to

Code: Select all

RewriteEngine On
RewriteBase /blog/
RewriteRule ^(archives/([0-9]+)-[0-9a-z\.\_!;,\+\-]+\.html) index.php?/$1 [NC,L,QSA]
RewriteRule ^(authors/([0-9]+)-[0-9a-z\.\_!;,\+\-]+) index.php?/$1 [NC,L,QSA]
RewriteRule ^(feeds/categories/([0-9;]+)-[0-9a-z\.\_!;,\+\-]+\.rss) index.php?/$1 [NC,L,QSA]
RewriteRule ^(feeds/authors/([0-9]+)-[0-9a-z\.\_!;,\+\-]+\.rss) index.php?/$1 [NC,L,QSA]
RewriteRule ^(categories/([0-9;]+)-[0-9a-z\.\_!;,\+\-]+) index.php?/$1 [NC,L,QSA]
RewriteRule ^archives([/A-Za-z0-9]+)\.html index.php?url=/archives/$1.html [NC,L,QSA]
RewriteRule ^([0-9]+)[_\-][0-9a-z_\-]*\.html index.php?url=$1-article.html [L,NC,QSA]
RewriteRule ^feeds/(.*) index.php?url=/feeds/$1 [L,QSA]
RewriteRule ^unsubscribe/(.*)/([0-9]+) index.php?url=/unsubscribe/$1/$2 [L,QSA]
RewriteRule ^approve/(.*)/(.*)/([0-9]+) index.php?url=approve/$1/$2/$3 [L,QSA]
RewriteRule ^delete/(.*)/(.*)/([0-9]+) index.php?url=delete/$1/$2/$3 [L,QSA]
RewriteRule ^(admin|entries)(/.+)? index.php?url=admin/ [L,QSA]
RewriteRule ^archive/? index.php?url=/archive [L,QSA]
RewriteRule ^(index|atom[0-9]*|rss|b2rss|b2rdf).(rss|rdf|rss2|xml) rss.php?file=$1&ext=$2
RewriteRule ^(plugin|plugin)/(.*) index.php?url=$1/$2 [L,QSA]
RewriteRule ^search/(.*) index.php?url=/search/$1 [L,QSA]
RewriteRule ^(serendipity\.css|serendipity_admin\.css) index.php?url=/$1 [L,QSA]
RewriteRule ^index\.(html?|php.+) index.php?url=index.html [L,QSA]
RewriteRule ^htmlarea/(.*) htmlarea/$1 [L,QSA]
RewriteRule (.*\.html?) index.php?url=/$1 [L,QSA]
So far (fingers crossed) I'm not getting 404's for valid links any longer.
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

For the less-than-eagle-eyed among us, the difference is case insensitivity (NC) on the options for some rules, and the substitution of a real pattern for PAT_PERMALINK_FEEDAUTHORS.

That substitution makes me think Serendipity wasn't working properly when it wrote your initial .htaccess.
Judebert
---
Website | Wishlist | PayPal
Post Reply