Karma causing 404's with search engines?

Found a bug? Tell us!!
Post Reply
Michael Harrison
Regular
Posts: 51
Joined: Sat Jan 28, 2006 12:50 pm

Karma causing 404's with search engines?

Post by Michael Harrison »

I apologize if this is a dupe but couldn't find an instance of this problem in the forum although I can't quite believe I'm the only one who's seen this problem.

Google and Yahoo have reported a number of odd 404 errors in connection with my 1.0 s9y install. As near as I can tell they end up being caused by the Karma voting links. I don't have concrete proof of this but am wondering if anyone else has see those engines reporting that a link returns a 404, but otherwise finding that you can actively browse to the link in question.

How do I go about debugging these problems? The error log never has enough information that I can tell where the errors are really occurring.
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Karma causing 404's with search engines?

Post by garvinhicking »

Hi!

Usually 404 errors can be found in the ACCESS log of your webserver, not in the error log.

There you should be able to see which pages the bots do not find.

Also, what's your URL, then I could check if your install is in order.

HTH,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
Michael Harrison
Regular
Posts: 51
Joined: Sat Jan 28, 2006 12:50 pm

Post by Michael Harrison »

My blog is at http://www.dragonseye.com/blog

Unfortunately I don't have direct access to my error logs but I'll have to see if I can get them from my host.
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

Looks up and functional. Your error logs won't have the 404s, only the access log. Webstats, AWStats, and other tracking programs use the access log to show you who visited the site.

What kind of redirection are you using in the Serendipity config? Is it the mod_rewrite or the mod_error? The mod_error redirection checks for file existence first, so it might signal 404 errors.
Judebert
---
Website | Wishlist | PayPal
Michael Harrison
Regular
Posts: 51
Joined: Sat Jan 28, 2006 12:50 pm

Post by Michael Harrison »

judebert wrote:Looks up and functional. Your error logs won't have the 404s, only the access log. Webstats, AWStats, and other tracking programs use the access log to show you who visited the site.
Sorry, mis-read your post above (and I was sleepy) :-)

I've found one example of an odd error, perhaps you can advise...

66.249.65.134 - - [31/Aug/2006:07:50:35 -0700] "GET /forum/viewtopic.php?t=4069&view=next&sid=c7a9d6760d2ef4cd32133ffaed007eef HTTP/1.1" 200 8164 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.134 - - [31/Aug/2006:07:51:18 -0700] "GET /pcg/PCGGI/index.html HTTP/1.1" 302 238 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.134 - - [31/Aug/2006:07:52:47 -0700] "GET /blog/categories/8-Holograms/pages/gallery/v/family/MichaelH/Holograms/pages/contactform/gallery/v/family/MichaelH/Holograms/pages/gallery/v/family/MichaelH/Holograms/SandDollarLeft_amp_Right/P1.html HTTP/1.1" 200 12450 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.134 - - [31/Aug/2006:07:52:48 -0700] "GET /blog/categories/8-Holograms/pages/gallery/v/family/MichaelH/Holograms/SandDollarLeft_amp_Right/pages/contactform/gallery/v/family/MichaelH/Holograms/SandDollarLeft_amp_Right/pages/gallery/v/family/MichaelH/Holograms/SandDollarLeft_amp_Right.jpg.html HTTP/1.1" 200 12465 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

The bot goes from a mundane index page to a url that doesn't exist anywhere on my site. I'm not sure how that concatenation of s9y and Gallery urls came about.
judebert wrote:What kind of redirection are you using in the Serendipity config? Is it the mod_rewrite or the mod_error? The mod_error redirection checks for file existence first, so it might signal 404 errors.
I'm using mod_rewrite.
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Post by garvinhicking »

Hi!

This concatenation can happen if you somewhere use relative URLs to link to your gallery/static pages.

Like if a link is

pages/gallery/v/family/MichaelH/Holograms.html

instead of

/pages/gallery/v/family/MichaelH/Holograms.html


You might end up with unlimited recursion, because the relative paths add up and up and up, because s9y just matches by ID and not a full URL string. :)

Thus you'll need to check your blog thoroughly for any relative links that might cause this.

HTH,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
Michael Harrison
Regular
Posts: 51
Joined: Sat Jan 28, 2006 12:50 pm

Post by Michael Harrison »

Thanks. I've fixed a number of references that didn't start with http:// and should have. I'll see how things go from here.

My thanks to both of you for your help.
Post Reply