Return-Path: X-Original-To: apmail-spamassassin-users-archive@www.apache.org Delivered-To: apmail-spamassassin-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 79891CC98 for ; Tue, 5 Jun 2012 16:27:15 +0000 (UTC) Received: (qmail 99722 invoked by uid 500); 5 Jun 2012 16:27:13 -0000 Delivered-To: apmail-spamassassin-users-archive@spamassassin.apache.org Received: (qmail 99677 invoked by uid 500); 5 Jun 2012 16:27:13 -0000 Mailing-List: contact users-help@spamassassin.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@spamassassin.apache.org Received: (qmail 99667 invoked by uid 99); 5 Jun 2012 16:27:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jun 2012 16:27:12 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ctiwald@salsalabs.com designates 209.85.212.50 as permitted sender) Received: from [209.85.212.50] (HELO mail-vb0-f50.google.com) (209.85.212.50) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jun 2012 16:27:08 +0000 Received: by vbal1 with SMTP id l1so4163468vba.37 for ; Tue, 05 Jun 2012 09:26:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=salsalabs.com; s=salsaga; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=0qS7iQIJtMkmxdiIk6LVzhoyYZ9fWZTvN6FNbmaF0QA=; b=pqWmu9rO4m01uk+bwZfGXxCsDON7s6FCQbpHVTKwT+Dl73jSsG+MaSm733l2RW1n4m uR4QQ9p4h+G0Ihx+UyZYyTPG6lH/k+F3e/7UKuWqAMBkZSYRkeqYtfboEatDm8CIAdG+ KS1xEA9esQ9f4uOvF5SYoGxGiB4hUzloCNLLs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent :x-gm-message-state; bh=0qS7iQIJtMkmxdiIk6LVzhoyYZ9fWZTvN6FNbmaF0QA=; b=iEVKX0riguT7KxpCxXwFrzqGoVpVqkjEW4R+hdBmNW6URLN4F6uf8TrtjE2f+vTEoI SSxvAEQlvuVfsHtxBQtoVxtKGkJlWWOWrn9Eu7UlqGuD/T0dNIjk7MvJNOjp7eWkRcyZ kWZHP2v6F5yh11c29cmVDmWuZbjcIu7x1oWsR67twWsKcCrLXgUn3pEpTFn/m1FPtIiH hPHPi5VR8C6eLrLpUBEAenCPF5fkzFi74xGlOfbt8MsqjRWn6CluJr5oo28c/1bdcArD 8N67G3WquiYalNXyqfd+d4kmd0dhNLy6ZAh10GIOyVPawN6qVeIvS5U7mQB0H2aCDxt0 uKGg== Received: by 10.52.27.72 with SMTP id r8mr15045510vdg.14.1338913607478; Tue, 05 Jun 2012 09:26:47 -0700 (PDT) Received: from Christopher-Tiwalds-MacBook-Pro.local ([216.55.38.246]) by mx.google.com with ESMTPS id g10sm3244508vdk.2.2012.06.05.09.26.46 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 05 Jun 2012 09:26:46 -0700 (PDT) Date: Tue, 5 Jun 2012 12:26:43 -0400 From: Christopher Tiwald To: "Kevin A. McGrail" Cc: Brett Schenker , users@spamassassin.apache.org Subject: Re: CKEditor causing high spam score Message-ID: <20120605162643.GA1063@Christopher-Tiwalds-MacBook-Pro.local> References: <4FCE2831.8050602@PCCC.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4FCE2831.8050602@PCCC.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Gm-Message-State: ALoCoQmllpJ6EehXgZyKcdGF9aKf34btmjSYE6V9hCLcPB9M16zVBBpt8OruXtDvvzG5LWV+Ry7H X-Virus-Checked: Checked by ClamAV on apache.org On Tue, Jun 05, 2012 at 11:39:29AM -0400, Kevin A. McGrail wrote: > A) These are just sub rules for use in a meta. As a specialist in > meta rules, just because you hit a sub rule doesn't matter. What > matters is if it triggers a scoring rule. Does it? > > B) I don't recognize those rules or know where they came from. > Where did they come from? > The scoring rule is 4.0 JM_SOUGHT_3, which is one of the "sought channel" rules distributed (and regularly updated) by the sought.rules.yerp.org channel in SpamAssassin [1]. That link is a little dated, but the channel is not. It comes stock now with `yum install spamassassin` on RHEL 6, and can be added to a local installation of SA by following the instructions in the link above. The specific path for my vanilla install is: /var/lib/spamassassin/3.003002/sought_rules_yerp_org/20_sought.cf As far as I can tell (admittedly, I haven't studied source), it's simply doing regex matching on a variety of spammy content. Nothing terribly sophisticated -- the pattern matching is straight up "does this exact string exist?" The problem is it's picked up artifacts of CKEditor, a common CRM/CMS editor. I was able to demonstrate the problem using CKEditor's demo page [2], and posted the SO question Brett cited earlier [3]. One option for us would be to disable the WYSIWYG, but I can't imagine we're the only ones affected. The CKEditor user page lists a variety of large companies and bulk email providers, including MailChimp [4]. [1] http://taint.org/2007/08/15/004348a.html [2] http://ckeditor.com/demo [3] http://stackoverflow.com/questions/10890407/ckeditors-html-artifacts-trigger-spamassassin-can-you-turn-ckeditors-html-mod [4] http://ckeditor.com/who-is-using-ckeditor -- Christopher Tiwald