spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Theo Van Dinter <felic...@apache.org>
Subject Re: breaking out: thinking abt the 'sa-update *VS* rdj' thread .. .
Date Fri, 01 Sep 2006 22:10:33 GMT
Wow...  This mail has been sitting in my draft folder for a while, so I
figured I ought to get it out.

On Wed, Aug 16, 2006 at 12:24:04PM -0400, Chris Santerre wrote:
> I got nothing but love for you, so here goes ;) ......

:)

> > Chris!  I'm surprised to hear you spreading this misinformation.
> > I don't really see how the project's rule development is a 
> > clusterfsck.
> > People commit rules for testing, they get tested, if they're 
> > good they're
> > put in an update.  What's the problem?
> 
> 1) Manpower. You just don't have enough people devoted to rules. Not your
> fault. And solving this, would not help. Beacuse of #2...
> 
> 2) Open community. By nature the SA project has to be open. That means
> public corpus, public discussion lists, and public test results. SARE woould
> not be as good if we had spammers watching our every move. MAJOR things we
> do MUST remain private. Our good results, the rules, are made public. And we
> offer them to anyone. 

Well, I don't think that's really true at all.  A lot of things are
public, some things aren't.  For instance, we *don't* have a public
corpus.  Each person's corpus is private, and they just send in the
mass-check results, which are public, but there's not a lot of information
one can get out of that IMO.

Test rules are public, which may or may not be problematic -- but since
the goal is to have the rule made public in the end anyway, I'm not sure
there's too much of an issue here.  Generally speaking if test rules are
good they should be published pretty quickly, so new rules still have
an impact, even if spammers actively pay attention to development and
adjust their mails accordingly.  Based on current results, that doesn't
seem to happen a lot.  (currently, people tend to come up with test rules
based on their own private tests on their own corpus -- when something looks
good, it gets committed for wider testing, so rule development is still
semi-private since the method for what rules to write is personal.)


> But since SARE's inception, you can't honestly tell me that SA has kept up
> with SARE's output. Be it quantity or quality. 

I actually couldn't tell you, I specifically ignore SARE's non-donated rules,
and I have no insight into the development process used.


> But for what end? SARE gives you our best rules to be added. So what would
> we gain by becoming part of SA. Seems we would lose more having to be more
> open about what we do.

I have thoughts about this at the end, but as far as I can see: the main
project gets stronger meaning the community is better served, and there's
really no downside.  So why not?


> open corpus vs closed. Live feed testing vs overnight GA runs. No public
> eyes in our discussion lists. Incredibly easy rule testing tools vs GA runs.
> People in different parts of the industry more inclined to help and provide
> info simply because of anonimity. Cross project benefits, again due to
> anonimity. 

live vs overnight mass-check runs (the GA was the tool used to generate scores
in the 2.x days, replaced by the perceptron -- which we don't run nightly, or
weekly, etc.  but that's another discussion,) is really just a matter of
putting in some effort to be able to do it.  We chose nightly and weekly
because it seemed to be quick enough to test new rules and be able to get them
out, and slow enough that it doesn't necessarily scare people away from
volunteering.

public discussion lists -- not all of our lists are public, and the others are
generally invite-only.  though we don't generally have a lot of those, and
most conversation happens in personal mails anyway.

"incredibly easy rule testing tools vs GA runs" -- I don't know what
you guys have (is there something easier/less involved than running the
rules over messages and looking at the results?), but if it's better
than what's in the project currently, why not contribute it?

"people in different ... anonimity" -- sure, though that's possible either
way.

I really don't see the issue here.

> The question might be, what exactly does the SA project want of SARE? All we
> have to offer is rules, and we already give those up freely. 

In short, I'd like to see our two groups merge.  There are several issues here:

1) Having multiple organizations providing rules is confusing/annoying
to users, as has been discussed previously on this list.

2) Duplicated effort.  Why have multiple people working on multiple
rules that do the same thing?  That's inefficient in various ways.

3) The SA project can't take the rules from SARE's site, they have to be
contributed.  That doesn't actually happen very often.  Most (all?) of
the SARE people who currently have commit access to the SA project
haven't made commits in a long time, if ever.

4) The SA project, as previous discussed, no longer has the manpower to deal
with both the engine and the rules with the detail and attention that they
deserve.  This is bad.

5) Last, and perhaps most importantly, the SA project is the foundation
of the community around it, unsurprisingly.  If the project doesn't work
well, be it engine or rules, people will give up and go elsewhere.
Then both groups, and all of our combined effort, goes to waste.
That's very very bad.


I think we'd all be better served by having a single project w/ lots
of active development, than two semi-related projects which end up
duplicating effort and competing for the same scarse set of resources.

-- 
Randomly Generated Tagline:
"I am returning this otherwise good typing paper to you because someone
 has printed gibberish all over it and put your name at the top."
                - English Prof. at Ohio University

Mime
View raw message