spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex <mysqlstud...@gmail.com>
Subject Re: .info TLD gives 2.1?
Date Mon, 21 Nov 2016 19:00:59 GMT
Hi,


On Mon, Nov 21, 2016 at 1:07 PM, Bill Cole
<sausers-20150205@billmail.scconsult.com> wrote:
> On 21 Nov 2016, at 3:18, Matus UHLAR - fantomas wrote:
>
>> On 20.11.16 19:46, Alex wrote:
>>>
>>> Am I reading this rule wrong, or does the presence of a .info domain
>>> enough to warrant a 2.8 score?
>>>
>>> *  2.1 URI_NO_WWW_INFO_CGI URI: CGI in .info TLD other than third-level
>>> "www"
>>>
>>>
>>> <https://clientservices.ogletreedeakins.info/rs/vm.ashx?ct=3D24F76A1AD5E20A=
>>> EDC1D180ACD125901ADFBE7BB3D38714D4CF371647BF8D90DDD78032>*
>>>
>>> uri URI_NO_WWW_INFO_CGI
>>> /^(?:https?:\/\/)?[^\/]+(?<!\/www)\.[^.]{7,}\.info\/(?=\S{15,})\S*\?/i
>>>
>>> This particular email was scored at 5.30, and wouldn't have hit if it
>>> didn't also hit SORBS, but such a score seemed quite high for just the
>>> presence of a type of TLD.
>>
>>
>> it's not based only on .info tld:
>>
>> 1. TLD .info
>> 2. no 'www'
>> 3. third level domain
>> 4. at least 6 characters 2nd-level domain
>
>
> That's a 7 not a 6 :)
>
> The RE says a bit more, and is maybe clearer using words:
>
> http[s]://<hostname: not 'www'>.<domainname: 7 or more non-dots>.info/<15
or
> more non-whitespace characters including a literal ?>
>
> Note that the trailing '\?' in the RE means a literal '?' indicating that
> the URI has a CGI-style query string. That makes this a very specific URI
> pattern. There's nothing "wrong" with such a URI except for the fact that
> objectively the frequency of that uncommon pattern is much higher in spam
> than non-spam.
>
> I *suspect* that the pattern could be tightened a bit to reduce false
> positives without missing the spam that hits this rule, but I don't have any
> data to support that.

Thank you all for your explanations. I understood that it also
involved a CGI-style query string, but just didn't mention it.

If it would help, I have a handful of other non-spam URIs that hit
this rule, if it would help tighten it up a bit.

The part I was unsure of was if those 2.1 points were warranted
because I've only ever seen it in ham. Now I understand that it is.

Mime
View raw message