spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Boyer <bigg...@gmail.com>
Subject Re: HTML link regex
Date Thu, 27 Sep 2012 14:41:34 GMT
Hello all,

Here is a small ruleset that I'm working with. I added it to our local
ruleset in prod:

    # BAD LINKS N-NG
    ;-)                                                                                  
                                                                                         
      

    # Canada Post
                                                                                         
                                                                                         
        

    uri_detail   AJB_CANPOST_BADLINK             raw !~ /canadapost\./
    text =~ /(?:https?:\/\/(?:www\.)?|www\.)canadapost\./ type =~ /^a$/
    describe     AJB_CANPOST_BADLINK             Found a mismatch
    between href and anchored text pretending to link to www.canadapost.ca
    score        AJB_CANPOST_BADLINK             1.0
    meta         AJB_CANPOST_PHISH_BADTRACKNUM   Z_CANPOST_BADLINK &&
    !Z_CANPOST_TRACKNUM
    describe     AJB_CANPOST_PHISH_BADTRACKNUM   Mismatch between href
    and anchored + unofficial tracking number from CanadaPost
    score        AJB_CANPOST_PHISH_BADTRACKNUM   2.0
    #
    youtube                                                                              
                                                                                         
                     

    uri_detail   AJB_UTUBE_BADLINK   raw !~ /youtube\./ text =~
    /(?:https?:\/\/(?:www\.)?|www\.)youtube\./ type =~ /^a$/
    describe     AJB_UTUBE_BADLINK   Found a mismatch between href and
    anchored text pretending to link to www.youtube.com
    score        AJB_UTUBE_BADLINK   0.5
    # because of link trackers (from massmailer for example), we must
    meta this with other rulz to be sure we face our fake yutube botnet
                                                                                         
                        

    meta      AJB_FK_UTUBE_BOTNET     Z_UTUBE_BADLINK && Z_EMPTY_SUBJ &&
    MIME_HTML_ONLY
    describe  AJB_FK_UTUBE_BOTNET     mismatch between href and anchored
    + empty subject = botnet
    score     AJB_FK_UTUBE_BOTNET     5.5
    ##                                                                                   
                                                                                         
                       

    # TODO: check if we could workwith  DKIM, exists:List-Unsubscribe,
    SPF_PASS, RCVD_IN_RP_SAFE, RCVD_IN_RP_CERTIFIED and others
    #    in order to avoid FPs from MassMailers.

Note the TODO ;-)

I will work on this later on.

A question here: as per
http://wiki.apache.org/spamassassin/RuleSandboxes#Editing_Another_Developer.27s_Sandbox,
one can create a sandbox to submit rules.

Is it open to anybody? Should I do this to submit those rules (and more,
some I already have on my side and future ones)?

Could anyone from the list give me a primer about this process? I'm
interested (with my boss approval) in becoming a regular contributor but
this part of SA project is a little cryptic to me right now.

Do not hesitate to contact me off-list if necessary.

Alex, from prypiat.
Yes, I recycle.


On 12-09-26 11:03 AM, Bowie Bailey wrote:
> On 9/26/2012 10:45 AM, Alexandre Boyer wrote:
>> Hi all,
>>
>> Me happy :-D
>>
>> It works as expected for simple rules.
>>
>> For example, to get rid off my problem with youtube links I had this
>> simple rule:
>>
>>     uri_detail   Z_URIDETAIL_UTUBE_SPOOF   raw !~ /youtube\./ text =~
>>     /(https?://)?(www\.)?youtube\./ type =~ /^a$/
>>     score        Z_URIDETAIL_UTUBE_SPOOF   10.0
>>
>
> The alternatives on the text regexp are irrelevant.  An equivalent
> simpler regexp would be:
>
> text =~ /youtube\./
>
> Any optional text at the beginning or end of a (non-anchored) regexp
> should be left off unless you are trying to capture it for later use.
>

Mime
View raw message