spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Hardin <jhar...@impsec.org>
Subject Re: Spam Pattern
Date Wed, 12 Feb 2014 21:57:31 GMT
On Wed, 12 Feb 2014, Axb wrote:

> On 02/12/2014 10:46 PM, John Hardin wrote:
>>  On Wed, 12 Feb 2014, Axb wrote:
>> 
>> >  On 02/12/2014 10:06 PM, John Hardin wrote:
>> > > 
>> > >   Perhaps something like this:
>> > > 
>> > >   body      __HEXHASHWORD   /\b[0-9a-f]{30,}\s[a-z]{1,10}\b/
>> > >   tflags    __HEXHASHWORD   multiple maxhits=5
>> > >   meta      HEXHASH_WORD    __HEXHASHWORD > 4
>> > >   describe  HEXHASH_WORD    Hexadecimal hash followed by a word
>> > > 
>> > >   Added to my sandbox, just in case.
>> > 
>> >  John,
>> > 
>> >  Isn't {30,} (without a limit) dangerously expensive?
>>
>>  Potentially expensive; the character class and the fact that the
>>  following atom is not in that class limits the risk - backtracking isn't
>>  a possibility. However, point taken - recommend {30,64} instead.
>
> imo, you don't even need to count that much - I'd stop at sweet 16, anything 
> above is pink noise and not waste time chasing spaces & co.

That increases the FP risk, though. Having just hex strings in a email 
is not inherently a good spam sign, I would think, thus the desire to 
match long hex string + word with no intervening punctuation.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   WSJ on the Financial Stimulus package: "...today there are 700,000
   fewer jobs than [the administration] predicted we would have if we
   had done nothing at all."
-----------------------------------------------------------------------
  Today: Abraham Lincoln's and Charles Darwin's 205th Birthdays

Mime
View raw message