Mailing-List: contact users-help@spamassassin.apache.org; run by ezmlm
Precedence: bulk
Received-SPF: pass (nike.apache.org: domain of martin@gregorie.org designates
 77.75.108.125 as permitted sender)
Message-ID: <1360171565.30913.30.camel@zappa.gregorie.org>
Subject: Re: IS there a simple way to add a rule of a body mail test? I have
 a pattern..
From: Martin Gregorie <martin@gregorie.org>
Reply-To: martin@gregorie.org
To: users@spamassassin.apache.org
Date: Wed, 06 Feb 2013 17:26:05 +0000
In-Reply-To: <51127A93.8080608@ngtech.co.il>
References: <510D4C55.20207@ngtech.co.il>
	 <1359826772.18727.4.camel@zappa.gregorie.org>
	 <510D5998.4000705@ngtech.co.il>
	 <alpine.LNX.2.00.1302021045570.22540@athena.impsec.org>
	 <510D6F1F.7060606@ngtech.co.il>
	 <alpine.LNX.2.00.1302021256450.26227@athena.impsec.org>
	 <510D86DB.4070106@ngtech.co.il>
	 <alpine.LNX.2.00.1302022109200.4706@athena.impsec.org>
	 <510E2334.4060908@ngtech.co.il>
	 <alpine.LNX.2.00.1302030947100.19902@athena.impsec.org>
	 <51121A08.2050002@ngtech.co.il>
	 <alpine.LNX.2.00.1302060732340.14252@athena.impsec.org>
	 <51127A93.8080608@ngtech.co.il>
Organization: Martin Gregorie
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit

On Wed, 2013-02-06 at 17:45 +0200, Eliezer Croitoru wrote:

> Sorry but I didn't had much time to understand all of the rules syntax.
> 
When developing a meta rule that combines subrules there';s littlew
point in writing descriptions for the subrules. In addition I find its
helpful to do the initial development without the leading underscores
because this way you can see these rules firing. After the combination
is working as I want it to I put the underscores in. So, I'd start your
main rule like this:

describe   HBRW_SPAM Trap spam thats < 50% hebrew from specific a sender
header     HSFROM    From =~ /spamadmin\@ngtech.co.il/i
mimeheader HSENC     Content-type =~ /charset=.{0,3}windows-1251/i 
body       HSHCH     /[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/
tflags     HSHCH     multiple
body       HSTCH     /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/
tflags     HSTCH     multiple
meta       HSPCT     ( (HSHCH * 100) / (HSTCH + 1 ) )
meta       HBRW_SPAM (HSPCT < 1) && HSFROM && HSENC
score      HBRW_SPAM 10.3

Then this gets tested on a set of messages that exercise every subrule as well as 
checking that the metas work correctly. In this case I'd manually create simpler 
message bodies that exercise every test case (I think you'd need at least 10 test 
messages to fully test HBRW_SPAM and all its subrules). With this technique
you do need to use the lint check but don't need debugging because the 
list of rules 6that fires tell you whether a rule fired or didn't *and* will
show the number of times a 'multiple' fired.

After all is working correctly I put the underscores back:

#
# HBRW_SPAM detects messages from spamadmin@ngtech.co.il with a message body or
# part using the Windows 1251 (Hebrew) charset and that contains mostly
# non-Hebrew text.
# 
describe   HBRW_SPAM Trap spam thats < 50% hebrew from specific a sender
header     __HSFROM    From =~ /spamadmin\@ngtech.co.il/i
mimeheader __HSENC     Content-type =~ /charset=.{0,3}windows-1251/i 
body       __HSHCH     /[\xC0-\xCB\xCD-\xDB\xDF-\xFB]?/
tflags     __HSHCH     multiple
body       __HSTCH     /[\x30-\x39\x41-\x5A\x61-\x7A\x80-\xFF]?/
tflags     __HSTCH     multiple
meta       __HSPCT     ( (__HSHCH * 100) / (__HSTCH + 1 ) )
meta       HBRW_SPAM (__HSPCT < 1) && __HSFROM && __HSENC
score      HBRW_SPAM 10.3

After that I re-lint and try all test cases again. I this case I'd do
the underscore additions on two stages: first add them to HSHCH and
HSTCH  so I can see that HSPCT still works and, if so, put the rest back
and re-test.

In a complex rule like this its well worth preceeding it with a set of
comment lines to describe it (as above). I like to use shorter names for
subrules (so the subrule name length won't be longer than the meta rule
name when the underscores have been put in) and to name them so their
names emphasize that they are part of the meta-rule. 

If you find out later that you want to use a subrule in more than one
meta-rule its easy enough to pull it out as a free-standing rule and
give it a description, a meaningful name and score it as 0.01, e.g.

describe   HEBREW-CHARSET MIME part or message body uses CHARSET 1251
mimeheader HEBREW-CHARSET Content-type =~ /charset=.{0,3}windows-1251/i
score      HEBREW-CHARSET 0.01 

and, of course, change the name of the subrule in the original metarule.
Forgetting this last step won't be picked up by a lint check. The meta
rule(s) that use the old name will merely think the subrule didn't fire.
 
HTH


Martin