spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amir Caspi <ceph...@3phase.com>
Subject Re: Bayes underperforming, HTML entities?
Date Thu, 08 Nov 2018 20:14:13 GMT
On Nov 8, 2018, at 12:20 PM, RW <rwmaillists@googlemail.com> wrote:
> 
> these emails don't contain a valid HTML mime section. They contain a bogus html section
that doesn't
> start with the separator defined in the top-level  Content-Type header.

Sorry, that is totally my fault.  In the spample, I was trying to sanitize any possible identifying
information and I ended up over-sanitizing.  I sanitized the separator string for text/plain
and at the end, but I missed the one for text/html.

So, bottom line -- the HTML mime section is actually valid in the original email.  The spample
is invalid because of my overzealousness/paranoia/idiocy.

If the HTML section is valid, as it appears to be ... then the HTML should be decoded.  And
yet, these emails are hitting BAYES_00 or BAYES_05 despite the spammy HTML text.  So, does
this mean my Bayes DB is borked?  Or does it mean something else?

In looking through my recent spams, almost all of them are hitting either BAYES_50 or lower...
almost none are hitting BAYES_99 (this includes the ones identified as spam for other scoring
reasons).  This is despite the training.  So I'm thinking maybe my Bayes DB is not working
properly... unless somehow the Bayes poison is actually working.  Though I doubt the latter
since discussions on here have asserted many times that "poison" doesn't work.  But, I don't
know why the DB would stop scoring properly all of a sudden, after working fine for years...

Thanks.

--- Amir


Mime
View raw message