spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Hardin <>
Subject Re: Spanish language i.c.w. DRUGS_ERECTILE et al.
Date Thu, 29 Aug 2019 18:10:11 GMT
On Thu, 29 Aug 2019, Matus UHLAR - fantomas wrote:

>> On Wed, 28 Aug 2019, Samy Ascha wrote:
>>> Today, I encountered, for the first time, an issue with scanning an email 
>>> that is composed in Spanish.
>>> It is hitting a fuzzy match somewhere in the DRUGS_ERECTILE and 
>>> DRUGS_ERECTILE_OBFU rules matches.
>>> I'm generally looking for a way to manipulate these edge cases, where 
>>> languages are likely to match rules assuming English for the body text.
>>> Is there any best-practice for this? I'm sure this happens in others' 
>>> networks, but I'm totally unsure on how to best resolve this.
>>> Anything in the way of configuration to combat this, e.g. by combining 
>>> language detection with other tags?
>>> Or, should I look into writing my own plugin to do something similar?
> On 28.08.19 07:48, John Hardin wrote:
>> Generally the approach is to add an exclusion for the specific valid 
>> non-english word to the rule itself.
> imho the best approach would be excluding hitting exact word for valid
> language, e.g. FUZZY_CREDIT shouldn't hit work "kredit" for languages where
> it's written this way


> but that needs deeper logic...

And a familiarity with potentially many languages...

  John Hardin KA7OHZ              FALaholic #11174     pgpk -a
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
   Are you a mildly tech-literate politico horrified by the level of
   ignorance demonstrated by lawmakers gearing up to regulate online
   technology they don't even begin to grasp? Cool. Now you have a
   tiny glimpse into a day in the life of a gun owner.   -- Sean Davis
  882 days since the first commercial re-flight of an orbital booster (SpaceX)

View raw message