spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karsten Br├Ąckelmann <>
Subject Re: sa-learn problems and comprehension question
Date Wed, 10 Nov 2010 17:04:01 GMT
On Tue, 2010-11-09 at 22:57 -0800, Karl Meyer wrote:
> > > --showdots /var/spool/imap/user/kmeyer/[0-9]*." amavis
> >                                         ^^^^^^^
> > This is dangerous. With lots of mail in the (Maildir?) folder, shell
> > expansion *quickly* will exceed the command line length limit.
> >
> > The trailing dot also looks bad.
> This is a good argument. I' ll think about that. The folder is a cyrus Imap
> folder. I used the [0-9]*. expression, because each cyrus folder contains
> messages with numbered filename with a trailing dot and four cyrus.* files
> (cache, index,...).

Not Maildir, and passing the dir itself doesn't seem like an option
either in the Cyrus case. Man pages to the rescue.

The Description section of the sa-learn man page holds this.

 "Note that csh-style globbing in the mail folder names is supported; in
  other words, listing a folder name as "*" will scan every folder that
  matches.  See "Mail::SpamAssassin::ArchiveIterator" for more details."

So, something similar to the above is possible. However, you will need
to escape or quote the globbing, to prevent the su-spawned shell from
expanding it (as the above does), but pass the glob to sa-learn.

> > A word of caution. There is no move command with IMAP. Instead, it is
> > copy and delete. Or rather mark-for-deletion, since there is no delete
> > command either. That's expunge.
> That's right. But is this a problem?
> First, I learn ham from the inbox folder, then spam from the junk folder. If
> a mail is moved to the junk folder meanwhile and no expunge was done, then
> it's not relearned as ham again and learned from the junk folder as spam
> newly.

That is a lot of unnecessary work, constantly re-learning messages.

Also, there's a race condition. So your user knows, spam will be learned
periodically. And that month worth of spam backlog in that folder is
ugly. Time to clean it up, and expunge. The Inbox though is precious.
Lots of important stuff and cute kitten attachments. Unless the Inbox is
expunged, too, "deleted" spam from the Inbox will be re-trained next
time. The copy in the spam folder is no more, to correct that "false
training by design".

(Caveat: I don't know how Cyrus handles deletion or flagging mail as
such. It is not Maildir.)

> > Using the Inbox rather than a dedicated ham folder therefore is NOT a
> > good idea.
> The problem is, that I can't persuade about 120 users to store all their ham
> below a defined folder. They want to sort their mails into several folders
> they created by their own.

You should be fine with some initial training of hand-sorted ham in a
dedicated folder. Then let auto-learn kick in.

Some script magic, using 'find' to collect the last n hours worth of ham
and spam might be an option, too. Used with the sa-learn -f option. Once
again, please do read the... man page.

char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0;

View raw message