Return-Path: X-Original-To: apmail-spamassassin-users-archive@www.apache.org Delivered-To: apmail-spamassassin-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AB3E4F0D8 for ; Sat, 20 Apr 2013 18:01:22 +0000 (UTC) Received: (qmail 13125 invoked by uid 500); 20 Apr 2013 18:01:20 -0000 Delivered-To: apmail-spamassassin-users-archive@spamassassin.apache.org Received: (qmail 13092 invoked by uid 500); 20 Apr 2013 18:01:20 -0000 Mailing-List: contact users-help@spamassassin.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@spamassassin.apache.org Received: (qmail 13085 invoked by uid 99); 20 Apr 2013 18:01:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Apr 2013 18:01:20 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jhardin@impsec.org designates 207.210.83.140 as permitted sender) Received: from [207.210.83.140] (HELO ga.impsec.org) (207.210.83.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Apr 2013 18:01:16 +0000 Received: from athena.impsec.org (localhost [127.0.0.1]) by ga.impsec.org (8.13.7/8.13.7) with ESMTP id r3KI0rI7028613 for ; Sat, 20 Apr 2013 11:00:53 -0700 Received: from athena.impsec.org (tunnel.impsec.org [127.0.0.1]) by athena.impsec.org (8.14.4/8.14.4) with ESMTP id r3KI0reA009253 for ; Sat, 20 Apr 2013 11:00:53 -0700 Received: from localhost (jhardin@localhost) by athena.impsec.org (8.14.4/8.14.4/Submit) with ESMTP id r3KI0qix009247 for ; Sat, 20 Apr 2013 11:00:52 -0700 X-Authentication-Warning: athena.impsec.org: jhardin owned process doing -bs Date: Sat, 20 Apr 2013 11:00:52 -0700 (PDT) From: John Hardin To: users@spamassassin.apache.org Subject: Re: re-learning ? was - bayes - large message In-Reply-To: <517274EE0200008500064362@FS-LIN-OES> Message-ID: References: <5171A87D02000085000642CB@FS-LIN-OES> <5171A9FC02000085000642CF@FS-LIN-OES> <51725B82020000850006435E@FS-LIN-OES> <20850.43137.248946.920882@pinky.delphioutpost.com> <517274EE0200008500064362@FS-LIN-OES> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Checked: Checked by ClamAV on apache.org On Sat, 20 Apr 2013, Joe Acquisto-j4 wrote: > In order to send the samples, the user will forward the messages, as an > attachment. Each is an individual message to either ham or spam, with > the (hopefully) correct attachment. Are you extracting the attachments off those messages to feed to sa-learn? Or are you feeding in the entire forwarded message including the attachment? If the latter, you're training stuff you shouldn't be (the headers of the submission to the training folders) and you'll see every user's submission of the same multi-recipient spam as being learned separately. This is one reason it's better, if possible, to have global training folders that users can just move/copy messages into. If training submissions pass though your mail system again, things get complicated. -- John Hardin KA7OHZ http://www.impsec.org/~jhardin/ jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 ----------------------------------------------------------------------- Christian martyrs don't explode. -- Marisol ----------------------------------------------------------------------- 3 days until Max Planck's 155th birthday