Return-Path: Delivered-To: apmail-spamassassin-users-archive@www.apache.org Received: (qmail 33726 invoked from network); 10 Nov 2010 17:04:27 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Nov 2010 17:04:27 -0000 Received: (qmail 58179 invoked by uid 500); 10 Nov 2010 17:04:53 -0000 Delivered-To: apmail-spamassassin-users-archive@spamassassin.apache.org Received: (qmail 58166 invoked by uid 500); 10 Nov 2010 17:04:52 -0000 Mailing-List: contact users-help@spamassassin.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@spamassassin.apache.org Received: (qmail 58149 invoked by uid 99); 10 Nov 2010 17:04:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Nov 2010 17:04:52 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [178.63.13.196] (HELO mail.rudersport.de) (178.63.13.196) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Nov 2010 17:04:42 +0000 Received: from [10.1.0.2] (unknown [213.188.117.102]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.rudersport.de (Postfix) with ESMTPSA id A3D1512422F4 for ; Wed, 10 Nov 2010 18:04:22 +0100 (CET) Subject: Re: sa-learn problems and comprehension question From: Karsten =?ISO-8859-1?Q?Br=E4ckelmann?= To: users@spamassassin.apache.org In-Reply-To: <30178053.post@talk.nabble.com> References: <30172306.post@talk.nabble.com> <1289324602.4734.23.camel@monkey> <30178053.post@talk.nabble.com> Content-Type: text/plain Date: Wed, 10 Nov 2010 18:04:01 +0100 Message-Id: <1289408641.4836.25.camel@monkey> Mime-Version: 1.0 X-Mailer: Evolution 2.22.1.1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On Tue, 2010-11-09 at 22:57 -0800, Karl Meyer wrote: > > > --showdots /var/spool/imap/user/kmeyer/[0-9]*." amavis > > ^^^^^^^ > > This is dangerous. With lots of mail in the (Maildir?) folder, shell > > expansion *quickly* will exceed the command line length limit. > > > > The trailing dot also looks bad. > > This is a good argument. I' ll think about that. The folder is a cyrus Imap > folder. I used the [0-9]*. expression, because each cyrus folder contains > messages with numbered filename with a trailing dot and four cyrus.* files > (cache, index,...). Not Maildir, and passing the dir itself doesn't seem like an option either in the Cyrus case. Man pages to the rescue. The Description section of the sa-learn man page holds this. "Note that csh-style globbing in the mail folder names is supported; in other words, listing a folder name as "*" will scan every folder that matches. See "Mail::SpamAssassin::ArchiveIterator" for more details." So, something similar to the above is possible. However, you will need to escape or quote the globbing, to prevent the su-spawned shell from expanding it (as the above does), but pass the glob to sa-learn. > > A word of caution. There is no move command with IMAP. Instead, it is > > copy and delete. Or rather mark-for-deletion, since there is no delete > > command either. That's expunge. > > That's right. But is this a problem? > First, I learn ham from the inbox folder, then spam from the junk folder. If > a mail is moved to the junk folder meanwhile and no expunge was done, then > it's not relearned as ham again and learned from the junk folder as spam > newly. That is a lot of unnecessary work, constantly re-learning messages. Also, there's a race condition. So your user knows, spam will be learned periodically. And that month worth of spam backlog in that folder is ugly. Time to clean it up, and expunge. The Inbox though is precious. Lots of important stuff and cute kitten attachments. Unless the Inbox is expunged, too, "deleted" spam from the Inbox will be re-trained next time. The copy in the spam folder is no more, to correct that "false training by design". (Caveat: I don't know how Cyrus handles deletion or flagging mail as such. It is not Maildir.) > > Using the Inbox rather than a dedicated ham folder therefore is NOT a > > good idea. > > The problem is, that I can't persuade about 120 users to store all their ham > below a defined folder. They want to sort their mails into several folders > they created by their own. You should be fine with some initial training of hand-sorted ham in a dedicated folder. Then let auto-learn kick in. Some script magic, using 'find' to collect the last n hours worth of ham and spam might be an option, too. Used with the sa-learn -f option. Once again, please do read the... man page. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}