Excluding bots from statistics?
Excluding bots from statistics?
Hi,
now that my statistics plugin works, I'd like to get rid of all the bots. I put a few of the bots' "hostnames" in the exclusion list, but that doesn't seem to have an effect. I then removed the list again to only contain "msnbot.msn.com" (which was the default after installing the plugin), and since then it doesn't even exclude that one.
What am I doing wrong? Is that a regular expression? If so, should I escape the period? This is a bit hard to test since I don't know when the next bot will come along ...
Thanks,
Robert
now that my statistics plugin works, I'd like to get rid of all the bots. I put a few of the bots' "hostnames" in the exclusion list, but that doesn't seem to have an effect. I then removed the list again to only contain "msnbot.msn.com" (which was the default after installing the plugin), and since then it doesn't even exclude that one.
What am I doing wrong? Is that a regular expression? If so, should I escape the period? This is a bit hard to test since I don't know when the next bot will come along ...
Thanks,
Robert
-
- Core Developer
- Posts: 30022
- Joined: Tue Sep 16, 2003 9:45 pm
- Location: Cologne, Germany
- Contact:
Re: Excluding bots from statistics?
Actually, the "hostname" of the bot is not the hostname, but the string of HTTP_USER_AGENT. It is applied as a full string match, not a regular expression.
See line 103 of the serendipity_event_statistics.php plugin, if you care.
Remember that setting this option only affects the tracking of new bots. Old bots that already have been tracked will not be removed from the tracking list.
HTH,
Garvin
See line 103 of the serendipity_event_statistics.php plugin, if you care.
Remember that setting this option only affects the tracking of new bots. Old bots that already have been tracked will not be removed from the tracking list.
HTH,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
Has anyone made a list of the main bots that I could insert there?
My blog: http://AtlanticReview.org
Wikipedia replies: http://en.wikipedia.org/wiki/User_agent#Bots
-
- Regular
- Posts: 51
- Joined: Sat Jan 28, 2006 12:50 pm
Has anyone been able to get this feature to work? No matter what string I use, the bots are still being recorded and displayed in the stats page.
I'm using s9y 0.91 with stats plugin 1.23
I'm not php-savvy so haven't accomplished anything by trolling the code.
I've tried the wikipedia strings:
Baiduspider ( http://www.baidu.com/search/spider.htm)|Googlebot/2.1 (+http://www.google.com/bot.html)|Mozilla/5.0 (compatible; googlebot/2.1; +http://www.google.com/bot.html)|Googlebot-Image/1.0| msnbot/1.0 (+http://search.msn.com/msnbot.htm)|Mozilla/5.0 (compatible; Yahoo! Slurp;http://help.yahoo.com/help/us/ysearch/slurp)|Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)
as well as the strings the stats plugin spits out:
msnbot/1.0 (+http://search.msn.com/msnbot.htm)|Baiduspider+(+http://www.baidu.com/search/spider.htm)|Mozilla/5.0 (compatible;Googlebot/2.1;+http://www.google.com/bot.html)|Technoratibot/0.7| Googlebot/2.1 (+http://www.google.com/bot.html)|Mozilla/5.0 (compatible; googlebot/2.1; +http://www.google.com/bot.html)|Mozilla/2.0 (compatible; Ask Jeeves/Teoma)
No good either way.
I'm using s9y 0.91 with stats plugin 1.23
I'm not php-savvy so haven't accomplished anything by trolling the code.
I've tried the wikipedia strings:
Baiduspider ( http://www.baidu.com/search/spider.htm)|Googlebot/2.1 (+http://www.google.com/bot.html)|Mozilla/5.0 (compatible; googlebot/2.1; +http://www.google.com/bot.html)|Googlebot-Image/1.0| msnbot/1.0 (+http://search.msn.com/msnbot.htm)|Mozilla/5.0 (compatible; Yahoo! Slurp;http://help.yahoo.com/help/us/ysearch/slurp)|Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)
as well as the strings the stats plugin spits out:
msnbot/1.0 (+http://search.msn.com/msnbot.htm)|Baiduspider+(+http://www.baidu.com/search/spider.htm)|Mozilla/5.0 (compatible;Googlebot/2.1;+http://www.google.com/bot.html)|Technoratibot/0.7| Googlebot/2.1 (+http://www.google.com/bot.html)|Mozilla/5.0 (compatible; googlebot/2.1; +http://www.google.com/bot.html)|Mozilla/2.0 (compatible; Ask Jeeves/Teoma)
No good either way.
-
- Core Developer
- Posts: 30022
- Joined: Tue Sep 16, 2003 9:45 pm
- Location: Cologne, Germany
- Contact:
The strings you put in the referrer blocks are interpreted as Regular Expressions. You need to escape all strings that indicate control characters in Regular Expressions in your intput string. Those are, but not limited to:
( ) . +
So your string should look something like:
Read more about regular expressions by searching wikipedia, php.net or google
Regards,
Garvin
( ) . +
So your string should look something like:
Code: Select all
Baiduspider \(http://www.baidu.com/search/spider.htm\)|Googlebot/2.1 \(http://www.google.com/bot.html\)|Mozilla/5.0 \(compatible; googlebot/2.1;|Googlebot-Image/1.0|msnbot/1.0 \(http://search.msn.com/msnbot.htm\)
Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
-
- Regular
- Posts: 51
- Joined: Sat Jan 28, 2006 12:50 pm
-
- Core Developer
- Posts: 30022
- Joined: Tue Sep 16, 2003 9:45 pm
- Location: Cologne, Germany
- Contact:
Uh, hang on. I totally confused this with something different. What I mentioned earlier on is what really applies:
I'm really sorry for mixing that up.
Regards,
Garvin
That means, you must specify exactly the string that the Bot submits as its HTTP User agent. You might want to grep your access logfiles to see, if they really submit the HTTP user agents exactly like you entered them?Actually, the "hostname" of the bot is not the hostname, but the string of HTTP_USER_AGENT. It is applied as a full string match, not a regular expression.
See line 103 of the serendipity_event_statistics.php plugin, if you care.
I'm really sorry for mixing that up.
Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
maybe a silly idea but would it be possible to use preg_match() to filter out bots ? As i see that bots change hostnames each week nowadays its a endless prayer to just add each new one. Why not filter on search.google.com who cares for the rest. Possibly 'some' normal users wouldnt get through as they have search or bot in theyre referrer, bad luck for them...
option? Im working on the statistics plugin anyway. If possible i could include it right away.
option? Im working on the statistics plugin anyway. If possible i could include it right away.
My kingdom For i am king of my heap of trash
Developing code on:
Workstation: Windows 2000 sp4, TSW webcoder 2005
Server: fedora core 4 amd64, apache 2.0.54, php 5.0.4, mysql 4.1.11.
Developing code on:
Workstation: Windows 2000 sp4, TSW webcoder 2005
Server: fedora core 4 amd64, apache 2.0.54, php 5.0.4, mysql 4.1.11.
Hi,
OK... I have not much idea about coding or php or anything. I was just wondering if anyone could help me on this issue. I've tried to put the bot's address (which appears on the stats page) on the bit in the plug in i have on the stats but this is not working.
Could someone please explain to me in plain english without too much technical stuff on how I could block the bots from being counted on my stats?
Thanks
Should have been born a blonde,
yati
my blog
OK... I have not much idea about coding or php or anything. I was just wondering if anyone could help me on this issue. I've tried to put the bot's address (which appears on the stats page) on the bit in the plug in i have on the stats but this is not working.
Could someone please explain to me in plain english without too much technical stuff on how I could block the bots from being counted on my stats?
Thanks
Should have been born a blonde,
yati
my blog