spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Thompson <r...@sasknow.com>
Subject Re: shifting the midpoint between the average spam and average ham scores back to 5.0
Date Fri, 03 Sep 2004 16:38:50 GMT
Joe Flowers wrote to users@spamassassin.apache.org:

> Help please!
>
> If the average spam score of all of my ham messages is 1.0 and the average 
> spam score of all of my spam messages is 3.0, then what is the best way to 
> move the average_of_ these_two_averages (2.0) back up to 5.0?
>
> The result being that I need my current average score for ham messages to be 
> "4" and my current average score for spam messages to be "6". And, I need to 
> do this without screwing up the relative statistics of spamassassin.

Hmm... After reading this thread, I think you *do* have a good question,
here, and that you did already get some good answers, but I'd like to
add a bit.

You make a valid point in that, if graphed separately, ham and spam
should show up as two separate curves on a graph. However, there *is*
overlap, and spam and ham (separately, or together) scores are *not*
normally distributed. They don't have to be to calculate the mean of the
means, but, in doing so, you're going to have a great deal of false
positives.

What you really should do is decide how many false positives you (and
your users) can live with. For us, it's 1/2000 (0.05%, one twentieth of
a percent). For this, you don't even need a spam corpus. Just collect a
good ham corpus (to get 0.05%, you need at least 2000 ham) and look at
the SA scores. Choose your threshold (or your constant modifier) to hit
on less than 1/2000 messages, and re-check regularly.

You can cross-check this with a spam corpus, if you want to balance FPs
against FNs (if you're well below your maximum FP ratio, you have some
room to play).

We get a lot less than 1/2000 FPs (usually 0), but 1/2000 is the maximum
ratio we'd allow before increasing the threshold.

- Ryan

-- 
   Ryan Thompson <ryan@sasknow.com>

   SaskNow Technologies - http://www.sasknow.com
   901-1st Avenue North - Saskatoon, SK - S7K 1Y4

         Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
   Toll-Free: 877-727-5669     (877-SASKNOW)     North America

Mime
View raw message