Return-Path: X-Original-To: apmail-spamassassin-users-archive@www.apache.org Delivered-To: apmail-spamassassin-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 672B610658 for ; Thu, 30 Jan 2014 19:03:50 +0000 (UTC) Received: (qmail 70500 invoked by uid 500); 30 Jan 2014 19:03:48 -0000 Delivered-To: apmail-spamassassin-users-archive@spamassassin.apache.org Received: (qmail 70451 invoked by uid 500); 30 Jan 2014 19:03:48 -0000 Mailing-List: contact users-help@spamassassin.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@spamassassin.apache.org Received: (qmail 70444 invoked by uid 99); 30 Jan 2014 19:03:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jan 2014 19:03:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of KMcGrail@pccc.com designates 38.124.232.10 as permitted sender) Received: from [38.124.232.10] (HELO intel1.peregrinehw.com) (38.124.232.10) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jan 2014 19:03:43 +0000 Received: from [127.0.0.1] (talonjr.pccc.com [38.124.232.60]) (authenticated bits=0) by intel1.peregrinehw.com (8.14.5/8.14.5) with ESMTP id s0UJ3APQ016977 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 30 Jan 2014 14:03:17 -0500 Message-ID: <52EAA1ED.9040104@PCCC.com> Date: Thu, 30 Jan 2014 14:03:09 -0500 From: "Kevin A. McGrail" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Amir Caspi CC: Andy Jezierski , "users@spamassassin.apache.org" Subject: Re: Help with a regex to catch spam with gibberish html tags References: <52EA8BB0.9090009@PCCC.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------010000030109070408000401" X-Authorized-User: 38.124.232.60 X-KAM-Reverse-AUTH: Exempt - 38.124.232.60 is an Authorized Sender X-Scanned-By: MIMEDefang 2.72 on 38.124.232.10 X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------010000030109070408000401 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 1/30/2014 12:39 PM, Amir Caspi wrote: > On Jan 30, 2014, at 10:28 AM, Kevin A. McGrail > wrote: > >> If you want to share the complete rule, I can throw it into my >> sandbox and see what masscheck thinks as well. > > The complete rule would be something like this, assuming Andy > implemented it as I wrote it: > > rawbody HTML_NONSENSE_TAGS/(?:<[A-Za-z0-9]{4,}>\s*){10,}/ > describe HTML_NONSENSE_TAGSMany consecutive multi-letter HTML tags, > likely nonsense/spam > score HTML_NONSENSE_TAGS0.001 > > Score to be adjusted as needed, of course. > > If one wants to be even more explicit, one could require that the tags > be prefaced with a