Excluding bots from statistics?

Creating and modifying plugins.
Post Reply
ghoti
Regular
Posts: 19
Joined: Tue Dec 06, 2005 2:34 am
Location: Charlotte, NC
Contact:

Excluding bots from statistics?

Post by ghoti »

Hi,

now that my statistics plugin works, I'd like to get rid of all the bots. I put a few of the bots' "hostnames" in the exclusion list, but that doesn't seem to have an effect. I then removed the list again to only contain "msnbot.msn.com" (which was the default after installing the plugin), and since then it doesn't even exclude that one.

What am I doing wrong? Is that a regular expression? If so, should I escape the period? This is a bit hard to test since I don't know when the next bot will come along ...

Thanks,

Robert
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Excluding bots from statistics?

Post by garvinhicking »

Actually, the "hostname" of the bot is not the hostname, but the string of HTTP_USER_AGENT. It is applied as a full string match, not a regular expression.

See line 103 of the serendipity_event_statistics.php plugin, if you care.

Remember that setting this option only affects the tracking of new bots. Old bots that already have been tracked will not be removed from the tracking list.

HTH,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
ghoti
Regular
Posts: 19
Joined: Tue Dec 06, 2005 2:34 am
Location: Charlotte, NC
Contact:

Post by ghoti »

Ah, thanks! I guess the default value threw me off. Looks good so far ... ;)
Josh
Regular
Posts: 110
Joined: Mon Jul 18, 2005 3:02 pm
Location: Berlin
Contact:

Post by Josh »

Has anyone made a list of the main bots that I could insert there?
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

Michael Harrison
Regular
Posts: 51
Joined: Sat Jan 28, 2006 12:50 pm

Post by Michael Harrison »

Has anyone been able to get this feature to work? No matter what string I use, the bots are still being recorded and displayed in the stats page.
I'm using s9y 0.91 with stats plugin 1.23
I'm not php-savvy so haven't accomplished anything by trolling the code.

I've tried the wikipedia strings:

Baiduspider ( http://www.baidu.com/search/spider.htm)|Googlebot/2.1 (+http://www.google.com/bot.html)|Mozilla/5.0 (compatible; googlebot/2.1; +http://www.google.com/bot.html)|Googlebot-Image/1.0| msnbot/1.0 (+http://search.msn.com/msnbot.htm)|Mozilla/5.0 (compatible; Yahoo! Slurp;http://help.yahoo.com/help/us/ysearch/slurp)|Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)

as well as the strings the stats plugin spits out:

msnbot/1.0 (+http://search.msn.com/msnbot.htm)|Baiduspider+(+http://www.baidu.com/search/spider.htm)|Mozilla/5.0 (compatible;Googlebot/2.1;+http://www.google.com/bot.html)|Technoratibot/0.7| Googlebot/2.1 (+http://www.google.com/bot.html)|Mozilla/5.0 (compatible; googlebot/2.1; +http://www.google.com/bot.html)|Mozilla/2.0 (compatible; Ask Jeeves/Teoma)

No good either way.
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Post by garvinhicking »

The strings you put in the referrer blocks are interpreted as Regular Expressions. You need to escape all strings that indicate control characters in Regular Expressions in your intput string. Those are, but not limited to:

( ) . +

So your string should look something like:

Code: Select all

Baiduspider \(http://www.baidu.com/search/spider.htm\)|Googlebot/2.1 \(http://www.google.com/bot.html\)|Mozilla/5.0 \(compatible; googlebot/2.1;|Googlebot-Image/1.0|msnbot/1.0 \(http://search.msn.com/msnbot.htm\)
Read more about regular expressions by searching wikipedia, php.net or google :)

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
Michael Harrison
Regular
Posts: 51
Joined: Sat Jan 28, 2006 12:50 pm

Post by Michael Harrison »

Wow, that was fast.

Thanks for the info. That certainly isn't clear through the admin interface.

btw, regex I get (I am a long-time programmer), I just don't do PHP although I'm thinking more and more about changing that.
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Post by garvinhicking »

Uh, hang on. I totally confused this with something different. What I mentioned earlier on is what really applies:
Actually, the "hostname" of the bot is not the hostname, but the string of HTTP_USER_AGENT. It is applied as a full string match, not a regular expression.

See line 103 of the serendipity_event_statistics.php plugin, if you care.
That means, you must specify exactly the string that the Bot submits as its HTTP User agent. You might want to grep your access logfiles to see, if they really submit the HTTP user agents exactly like you entered them?

I'm really sorry for mixing that up. :(

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

If you use FireFox, you can test this by getting the UserAgent extension and setting your user agent to one of the bot agents. Then you can visit your site and see if it's excluing you from the statistics.
Judebert
---
Website | Wishlist | PayPal
SHRIKEE
Regular
Posts: 128
Joined: Tue Feb 21, 2006 2:49 am
Location: Netherlands
Contact:

Post by SHRIKEE »

maybe a silly idea but would it be possible to use preg_match() to filter out bots ? As i see that bots change hostnames each week nowadays its a endless prayer to just add each new one. Why not filter on search.google.com who cares for the rest. Possibly 'some' normal users wouldnt get through as they have search or bot in theyre referrer, bad luck for them...

option? Im working on the statistics plugin anyway. If possible i could include it right away.
My kingdom For i am king of my heap of trash

Developing code on:
Workstation: Windows 2000 sp4, TSW webcoder 2005
Server: fedora core 4 amd64, apache 2.0.54, php 5.0.4, mysql 4.1.11.
yati
Posts: 1
Joined: Sat Nov 18, 2006 4:52 pm
Contact:

Post by yati »

Hi,

OK... I have not much idea about coding or php or anything. I was just wondering if anyone could help me on this issue. I've tried to put the bot's address (which appears on the stats page) on the bit in the plug in i have on the stats but this is not working.

Could someone please explain to me in plain english without too much technical stuff on how I could block the bots from being counted on my stats?

Thanks

Should have been born a blonde,
yati


my blog
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

Go to the plugin configuration; there you'll find an entry area for referrer blocking.

In that area, enter the exact user agent string of the bot you want to exclude. (Bot user agent strings can be found on Wikipedia.)

Hit save, and you're done.
Judebert
---
Website | Wishlist | PayPal
Post Reply