Contact form + Spamblock bayes

Creating and modifying plugins.
User avatar
onli
Regular
Posts: 2260
Joined: Tue Sep 09, 2008 10:04 pm
Contact:

Re: Contact form + Spamblock bayes

Post by onli » Thu Feb 11, 2016 6:38 pm

It's a bit more complicated.

Bayes is not setting the wrong message in its internal logic. It is blocking the comment, and normally – in every other s9y spamblock plugin – that would be it. The comment would be gone and could maybe restored from the log.

But when blocking a comment which has a spamrating below a custom barrier bayes is copying the comment into the recycler, to make this restoring easier.

I don't think that setting it to moderated is correct, it would not solve the problem, as comments would be forgotten in the recycler.

What could be done is to change the message. It actually uses a custom language constant, I just checked. It could say something like: "The Bayes Spamblock-Plugin thinks this is spam, but is not 100% sure. If you think this is a valid comment, please contact the blog author." Always if a comment gets blocked but saved into the recycler.

User avatar
yellowled
Regular
Posts: 7083
Joined: Fri Jan 13, 2006 12:46 pm
Location: Eutin, Germany
Contact:

Re: Contact form + Spamblock bayes

Post by yellowled » Thu Feb 11, 2016 7:25 pm

onli wrote:The comment would be gone and could maybe restored from the log.
I would not consider that “restoring”, but that is kind of splitting hairs. :)
onli wrote:I don't think that setting it to moderated is correct, it would not solve the problem, as comments would be forgotten in the recycler.
I don't think I said anything about setting it to moderated (internally or whatever). As far as I am concerned, the process of managing these comments internally, in the backend, is just fine. But I think the message is too concise.
onli wrote:"The Bayes Spamblock-Plugin thinks this is spam, but is not 100% sure. If you think this is a valid comment, please contact the blog author.
I feel like contacting the author is something that a lot of people probably would not do because it requires extra action on their part, and most people don't value blog comments high enough to do that. So I'm not sure if that part the message actually makes sense. I think the biggest problem is that the old message is too blunt. There's a difference between “rejected as spam” and “thinks this may be spam”.

YL
amazon Wishlist - Serendipity-Podcast (German only, sorry)

User avatar
onli
Regular
Posts: 2260
Joined: Tue Sep 09, 2008 10:04 pm
Contact:

Re: Contact form + Spamblock bayes

Post by onli » Thu Feb 11, 2016 7:37 pm

I'm open to suggestions :) Actually, if you have an idea please change it directly in the code, it is this variable in the language file:
@define('PLUGIN_EVENT_SPAMBLOCK_BAYES_ERROR', 'Abgewiesen als Spam.');

User avatar
yellowled
Regular
Posts: 7083
Joined: Fri Jan 13, 2006 12:46 pm
Location: Eutin, Germany
Contact:

Re: Contact form + Spamblock bayes

Post by yellowled » Thu Feb 11, 2016 9:07 pm

onli wrote:Actually, if you have an idea please change it directly in the code, it is this variable in the language file
I did, but I did not increment the version number to push the update. I'd like to leave this up to you because frankly, I'm not sure my suggestion is really better, at least not in any case.

I'd suggest to revise the plugin to emit different error messages depending on if people actually use the bayes trashcan or not.

YL
amazon Wishlist - Serendipity-Podcast (German only, sorry)

User avatar
yellowled
Regular
Posts: 7083
Joined: Fri Jan 13, 2006 12:46 pm
Location: Eutin, Germany
Contact:

Re: Contact form + Spamblock bayes

Post by yellowled » Sat Feb 13, 2016 4:35 pm

yellowled wrote:I will report back probably after about a week next, just wanted to give you a first impression. Again, don't overrate this, it's only been a day.
It's been about a week, time to check those numbers again. Assuming that 1 line in a logfile resembles a spam comment, I got 367 spam comments in that week altogether. That's 52 per day or about 2 per hour.
  • spam-bee.log: 183K filesize, 272 lines (74% overall); 206 Honeypot (56% overall), 66 Hidden Captcha (18% overall)
  • spam-std.log: 23K filesize, 93 lines (25% overall); 46 IP Validation, 46 Blog URL not found (12,5% overall each), 1 Moderation after X days (that was a trackback from another article in my blog)
  • spam-bay.log: 1,6K filesize, 2 lines (0,55% overall, actually) – both the comment and it's second attempt that I already mentioned earlier
So, conclusions after about a week: the spam bee plugin actually does still work in my blog, and it's pretty effective as well; also, no false positives here. The default spam block is still pretty effective as well; I would never drop it anyway, just the required fields alone are worth using it.

I was even thinking about dropping bayes altogether because I see where it could be more of a hazard for my individual group of topics and comments. After all, it did have the only two false positives. So I guess I'll keep bayes, but delete the existing database and retrain it? If the numbers hold up, there shouldn't be much training left to do, and the current database is probably filled with tokens that lead to it classifying stuff as spam that really isn't. What do you guys think?

Also, do all three spamblock plugin have the ability to split their logfiles monthly? That's what I would like to do, and I seem to recall reading somewhere that the logging ability was “inherited” from the default spamblock plugin or something?

YL
amazon Wishlist - Serendipity-Podcast (German only, sorry)

User avatar
Don Chambers
Regular
Posts: 3638
Joined: Mon Feb 13, 2006 3:40 am
Location: Chicago, IL, USA
Contact:

Re: Contact form + Spamblock bayes

Post by Don Chambers » Sat Feb 13, 2016 5:51 pm

I tried submitting your contact form just a moment ago. I supplied my name, email, and a message. I received the following messages:
Name, E-Mail und ihre Nachricht dürfen nicht leer gelassen werden.

Antispam Maßnahme: Falsches Captcha.

User avatar
yellowled
Regular
Posts: 7083
Joined: Fri Jan 13, 2006 12:46 pm
Location: Eutin, Germany
Contact:

Re: Contact form + Spamblock bayes

Post by yellowled » Sat Feb 13, 2016 7:54 pm

Don Chambers wrote:I tried submitting your contact form just a moment ago. I supplied my name, email, and a message. I received the following messages
Okay, so for whatever reason the form still thinks or at least reports that any of those fields were empty (I assume it is just emitting a bad error message). Also you apparently did not manage to solve the hidden captcha, which is irritating because you're not supposed to, and you probably don't see it.

The error message in the log for this is

Code: Select all

2016-02-13 16:47:21] - [REJECTED: BEE HiddenCaptcha [ 2 != � ]] - [#0, Name "don chambers", E-Mail "don.chambers@optional-necessity.com", URL "", User-Agent "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0", IP 50.129.218.59] - [Test message for your spam investigation]
So the plugin that catches you is the spam bee, the issue is with the hidden captcha. To me, „�“ smells like an encoding issue, and I'm pretty sure I have seen this before, but I don't know how to fix it. I also don't know how to reproduce it. I can send the contact form in Chrome and Firefox just fine. This must be related to your browser and/or OS maybe? I have no idea.

What I can (re)produce with the contact form is this, though:

Code: Select all

2016-02-13 18:45:31] - [moderate: Moderation nach X Tagen] - [#0, Name "Test user", E-Mail "mail@example.org", URL "", User-Agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:44.0) Gecko/20100101 Firefox/44.0", IP 91.8.201.237] - [Just testing]
And that I find particularly interesting because this is triggered for mails that actually reach me and for mails that don't. Both of them don't appear as moderated comments in the backend, though.

YL
amazon Wishlist - Serendipity-Podcast (German only, sorry)

User avatar
Don Chambers
Regular
Posts: 3638
Joined: Mon Feb 13, 2006 3:40 am
Location: Chicago, IL, USA
Contact:

Re: Contact form + Spamblock bayes

Post by Don Chambers » Sat Feb 13, 2016 8:49 pm

yellowled wrote:This must be related to your browser and/or OS maybe? I have no idea.
Just tried it with Opera and Chrome - same result. My OS is Windows 7.

User avatar
yellowled
Regular
Posts: 7083
Joined: Fri Jan 13, 2006 12:46 pm
Location: Eutin, Germany
Contact:

Re: Contact form + Spamblock bayes

Post by yellowled » Sun Feb 14, 2016 6:57 pm

Don Chambers wrote:Just tried it with Opera and Chrome - same result. My OS is Windows 7.
But those did not fail because of the spam bee, those were classified as spam by bayes. I already stated that with my current spam database, english messages will most likely always be classified as spam, just because most of the spam is in English and I don't get any valid comments in English. The interesting thing is that the Opera mail was rejected as spam, but the Chrome mail was not and made it to the trashcan.

However, the only contact form that failed because of the weird utf-8 question mark was sent in Firefox, but even that I can not reproduce in my Firefox on OS X.

I'll see if I can test this on Windows myself tomorrow, but I'm already on Windows 10.

YL
amazon Wishlist - Serendipity-Podcast (German only, sorry)

User avatar
yellowled
Regular
Posts: 7083
Joined: Fri Jan 13, 2006 12:46 pm
Location: Eutin, Germany
Contact:

Re: Contact form + Spamblock bayes

Post by yellowled » Mon Feb 15, 2016 3:27 pm

For the record:
yellowled wrote:So I guess I'll keep bayes, but delete the existing database and retrain it?
Already did that, seemed like the sensible thing to do.
yellowled wrote:Also, do all three spamblock plugin have the ability to split their logfiles monthly? That's what I would like to do, and I seem to recall reading somewhere that the logging ability was “inherited” from the default spamblock plugin or something?
Still would like to know about that.

YL
amazon Wishlist - Serendipity-Podcast (German only, sorry)

User avatar
yellowled
Regular
Posts: 7083
Joined: Fri Jan 13, 2006 12:46 pm
Location: Eutin, Germany
Contact:

Re: Contact form + Spamblock bayes

Post by yellowled » Mon Feb 15, 2016 6:14 pm

yellowled wrote:However, the only contact form that failed because of the weird utf-8 question mark was sent in Firefox, but even that I can not reproduce in my Firefox on OS X.

I'll see if I can test this on Windows myself tomorrow, but I'm already on Windows 10.
I can actually – kind of – reproduce it on Windows 10. Even in Chrome. But it is totally random, so I can not really reproduce it. At times, I thought this might be related to whether or not a URL was provided or it depended on whether you would provide an URL and check one of the checkboxes, but I don't think it is. Frankly, I have no idea what this could be. I'm not even sure whether it is a bug in the plugin or in the browsers, but I assume in the plugin, especially since it's pretty much unmaintained.

I assume this is what onli was referring to, and I will deactivate the Hidden Captcha.

YL
amazon Wishlist - Serendipity-Podcast (German only, sorry)

User avatar
yellowled
Regular
Posts: 7083
Joined: Fri Jan 13, 2006 12:46 pm
Location: Eutin, Germany
Contact:

Re: Contact form + Spamblock bayes

Post by yellowled » Mon Feb 15, 2016 6:17 pm

Interesting side aspect: because I tried this multiple times, I have quite some log entries for the bee captcha now. It is not always „[<NUMBER> != � ]“, I also have occurrences of „[ 1 != 2 ]“. My initial idea that this might be related to encoding is probably false.

Maybe the plugin's JS can't calculate? :shock:

YL
amazon Wishlist - Serendipity-Podcast (German only, sorry)

User avatar
yellowled
Regular
Posts: 7083
Joined: Fri Jan 13, 2006 12:46 pm
Location: Eutin, Germany
Contact:

Re: Contact form + Spamblock bayes

Post by yellowled » Wed Feb 17, 2016 12:56 pm

yellowled wrote:Also, do all three spamblock plugin have the ability to split their logfiles monthly? That's what I would like to do, and I seem to recall reading somewhere that the logging ability was “inherited” from the default spamblock plugin or something?
For the record, I have tried a few things and it seems as if indeed all three spamblock plugins are able to split their logfiles daily, monthy or yearly (the latter I haven't tried myself because it seems pointless). However, the variables %Y, %m and %d seem to only work on the logfile's name, not on the path (so i.e. directories per year containing monthly logfiles is not possible).

YL
amazon Wishlist - Serendipity-Podcast (German only, sorry)

User avatar
Timbalu
Regular
Posts: 4598
Joined: Sun May 02, 2004 3:04 pm

Re: Contact form + Spamblock bayes

Post by Timbalu » Thu May 05, 2016 12:42 pm

yellowled wrote:Interesting side aspect: because I tried this multiple times, I have quite some log entries for the bee captcha now. It is not always „[<NUMBER> != � ]“, I also have occurrences of „[ 1 != 2 ]“. My initial idea that this might be related to encoding is probably false.

Maybe the plugin's JS can't calculate? :shock:
@YL, Before I forget about it... Could you replace the additional_plugins/serendipity_event_spamblock_bee/json/JSON.php file with this file and check this again, please?
This might help (I personally don't think so, since it is a fallback file only) for this mentioned strange en/de-coding issue.

[The extension php has been deactivated and can no longer be displayed.]

PS. ...and you did have it set to JSON as Method and Regular Expressions to NO, did you?

And btw I checked all my logs and never found an entry rejected with �. Maybe this was while you logged to a file?
I cannot reproduce this behaviour at all ... perhaps a win10 "beta" only issue?
Regards,
Ian

Serendipity Styx Edition and additional_plugins @ https://ophian.github.io/ @ https://github.com/ophian

User avatar
yellowled
Regular
Posts: 7083
Joined: Fri Jan 13, 2006 12:46 pm
Location: Eutin, Germany
Contact:

Re: Contact form + Spamblock bayes

Post by yellowled » Fri May 06, 2016 12:48 pm

Timbalu wrote:Before I forget about it... Could you replace the additional_plugins/serendipity_event_spamblock_bee/json/JSON.php file with this file and check this again, please?
I do not have the time (or the nerve) at the moment to experiment with a productive system that currently works fine for me. Sorry.
Timbalu wrote:PS. ...and you did have it set to JSON as Method and Regular Expressions to NO, did you?
“Standard” as method, regexp to “No”.
Timbalu wrote:And btw I checked all my logs and never found an entry rejected with �. Maybe this was while you logged to a file?
Of course, I never log to the database because it is way easier for me to delete a few log files.
Timbalu wrote:I cannot reproduce this behaviour at all ... perhaps a win10 "beta" only issue?
I never used a Windows 10 Beta nad I never do any blog-related stuff on a Windows machine. All my blogs, even the dev blogs, are on Uberspaces that run on CentOS 5.

YL
amazon Wishlist - Serendipity-Podcast (German only, sorry)

Post Reply