From users-return-121010-archive-asf-public=cust-asf.ponee.io@spamassassin.apache.org Thu Aug 29 18:10:37 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 656FF180608 for ; Thu, 29 Aug 2019 20:10:37 +0200 (CEST) Received: (qmail 44457 invoked by uid 500); 29 Aug 2019 18:10:36 -0000 Mailing-List: contact users-help@spamassassin.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@spamassassin.apache.org Received: (qmail 44435 invoked by uid 99); 29 Aug 2019 18:10:36 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Aug 2019 18:10:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 41ACBC01EF for ; Thu, 29 Aug 2019 18:10:35 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.81 X-Spam-Level: X-Spam-Status: No, score=0.81 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, SPF_HELO_PASS=-0.001, T_SPF_TEMPERROR=0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 9I-iiE2F1WtJ for ; Thu, 29 Aug 2019 18:10:30 +0000 (UTC) Received-SPF: Pass (helo) identity=helo; client-ip=108.161.139.220; helo=mail2.impsec.org; envelope-from=jhardin@impsec.org; receiver= Received: from mail2.impsec.org (ga.impsec.org [108.161.139.220]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id B9517BC7A9 for ; Thu, 29 Aug 2019 18:10:21 +0000 (UTC) Received: from athena.impsec.org (localhost [127.0.0.1]) by ga.impsec.org (8.14.7/8.14.7) with ESMTP id x7TIACZk003474 for ; Thu, 29 Aug 2019 12:10:14 -0600 Received: from athena.impsec.org (tunnel.impsec.org [127.0.0.1]) by athena.impsec.org (8.14.9/8.14.9) with ESMTP id x7TIACFf008047 for ; Thu, 29 Aug 2019 11:10:12 -0700 Received: from localhost (jhardin@localhost) by athena.impsec.org (8.14.9/8.14.9/Submit) with ESMTP id x7TIABKD008041 for ; Thu, 29 Aug 2019 11:10:11 -0700 X-Authentication-Warning: athena.impsec.org: jhardin owned process doing -bs Date: Thu, 29 Aug 2019 11:10:11 -0700 (PDT) From: John Hardin To: users@spamassassin.apache.org Subject: Re: Spanish language i.c.w. DRUGS_ERECTILE et al. In-Reply-To: <20190829154719.GA6782@fantomas.sk> Message-ID: References: <20190829154719.GA6782@fantomas.sk> User-Agent: Alpine 2.21 (LNX 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed X-Greylist: inspected by milter-greylist-4.6.2 (ga.impsec.org [127.0.0.1]); Thu, 29 Aug 2019 12:10:14 -0600 (CST) for IP:'127.0.0.1' DOMAIN:'localhost' HELO:'athena.impsec.org' FROM:'jhardin@impsec.org' RCPT:'' X-Greylist: Sender IP whitelisted, ACL 266 matched, not delayed by milter-greylist-4.6.2 (ga.impsec.org [127.0.0.1]); Thu, 29 Aug 2019 12:10:14 -0600 (CST) On Thu, 29 Aug 2019, Matus UHLAR - fantomas wrote: >> On Wed, 28 Aug 2019, Samy Ascha wrote: >>> Today, I encountered, for the first time, an issue with scanning an email >>> that is composed in Spanish. >>> >>> It is hitting a fuzzy match somewhere in the DRUGS_ERECTILE and >>> DRUGS_ERECTILE_OBFU rules matches. >>> >>> I'm generally looking for a way to manipulate these edge cases, where >>> languages are likely to match rules assuming English for the body text. >>> >>> Is there any best-practice for this? I'm sure this happens in others' >>> networks, but I'm totally unsure on how to best resolve this. >>> >>> Anything in the way of configuration to combat this, e.g. by combining >>> language detection with other tags? >>> >>> Or, should I look into writing my own plugin to do something similar? > > On 28.08.19 07:48, John Hardin wrote: >> Generally the approach is to add an exclusion for the specific valid >> non-english word to the rule itself. > > imho the best approach would be excluding hitting exact word for valid > language, e.g. FUZZY_CREDIT shouldn't hit work "kredit" for languages where > it's written this way Exactly. > but that needs deeper logic... And a familiarity with potentially many languages... -- John Hardin KA7OHZ http://www.impsec.org/~jhardin/ jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 ----------------------------------------------------------------------- Are you a mildly tech-literate politico horrified by the level of ignorance demonstrated by lawmakers gearing up to regulate online technology they don't even begin to grasp? Cool. Now you have a tiny glimpse into a day in the life of a gun owner. -- Sean Davis ----------------------------------------------------------------------- 882 days since the first commercial re-flight of an orbital booster (SpaceX)