spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. McGrail" <kmcgr...@apache.org>
Subject Re: Custom rule aware of occurrences
Date Mon, 16 Sep 2019 14:01:31 GMT

On 9/15/2019 10:53 PM, Bert Van de Poel wrote:
> Dear fellow Spamassassin users,
>
> I'm contacting you as a member of ULYSSIS. ULYSSIS is a student
> non-profit organisation at the University of Leuven trying to make
> computers and technology more approachable and available to students.
> As part of this objective, we run a hosting service within our
> university's network for student organisations, student unions and
> individuals at our university.
>
> We've battled with spam from time to time, since we seem to attract a
> lot of exotic languages which are rather well able to circumvent
> commonly used methods. This has had us resort to some custom rulesets
> to battle against mostly targetted French and SEO spam often coming
> from very respectable servers and very normal addresses.
>
> Now because SEO spam specifically has been adapting quite well to any
> rule we think of (finding alternative ways of saying the same thing
> time and time again), I was hoping to write a rule that basically
> boiled down to "give some spam score to emails that contain the word
> SEO 3 or more times" to push those already being detected by other
> rules over the edge. To be clear, this will be a low score rule, I'm
> aware that ham can perfectly well contain that word 3 times, just like
> this email for example. Now while investigating I started wondering
> how to tackle that some spam will just have a plain text body, while
> others will also feature HTML, which means that suddenly the amount
> may double/half. Beyond that it seems quite hacky to use a regex that
> boils down to something like /\bSEO\b.*\bSEO\b.*\bSEO\b/i instead of
> something that is properly aware of the count of certain words.
>
> Since I sort of expected Spamassassin to have a solution for both the
> text/text+html and the counting problems, I asked around on IRC but
> was pointed here. So uhm, any suggestions or pointers are more than
> welcome. Not too sure if any more information is required, but feel
> free to ask questions or corect my presumptions if necessary.
>
Bert, off the cuff, SA pretty readily handles things like this.  What we
normally ask for is a sample of an email with all headers showing the
problem.  Put it up on pastebin.com since it's likely to be blocked if
you email it.

you likely want a rule that looks for SEO and a multiple maxhits tflag. 
You can look at http://www.mcgrail.com/downloads/KAM.cf for examples.

Regards,

KAM

-- 
Kevin A. McGrail
KMcGrail@Apache.org

Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


Mime
View raw message