spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Lemke <le...@jam-software.com>
Subject Re: Help with blocking Chinese Spam
Date Tue, 13 Mar 2012 14:24:52 GMT


Henrik K wrote:
> 
> On Tue, Mar 13, 2012 at 06:17:53AM -0700, Daniel Lemke wrote:
>> 
>> 
>> 
>> Jenny Lee-2 wrote:
>> > 
>> > 
>> >> Date: Tue, 13 Mar 2012 05:47:03 -0700
>> >> From: lemke@jam-software.com
>> >> To: users@spamassassin.apache.org
>> >> Subject: RE: Help with blocking Chinese Spam
>> >> 
>> >> 
>> >> 
>> >> Jenny Lee-2 wrote:
>> >> > 
>> >> > I did turn it on in the .pre. It is also supposed to add a header,
>> but
>> >> it
>> >> > does not. How can I check if it is working or not?
>> >> > 
>> >> > I have:
>> >> > 
>> >> > ok_locales en
>> >> > ok_languages en
>> >> > 
>> >> > Jenny 
>> >> > 
>> >> 
>> >> 
>> >> Add this to your config file:
>> >> 
>> >> add_header all Language _LANGUAGES_ 
>> >  
>> > This adds the header. Thank you.
>> >  
>> > However, running: spamassassin -D < chinesespam
>> >  
>> > Does not catch this.
>> >  
>> > Jenny
>> >  
>> > Mar 13 17:06:36.294 [27011] dbg: plugin:
>> > Mail::SpamAssassin::Plugin::TextCat=HASH(0x1d50bc8) implements
>> > 'extract_metadata', priority 0
>> > Mar 13 17:06:36.294 [27011] dbg: message: ---- MIME PARSER START ----
>> > Mar 13 17:06:36.295 [27011] dbg: message: parsing multipart, got
>> boundary:
>> > ----=_NextPart_000_004F_0181A2CA.182A5CF0
>> > Mar 13 17:06:36.295 [27011] dbg: message: found part of type
>> > multipart/alternative, boundary:
>> ----=_NextPart_001_034A_0181A2CA.182A5CF0
>> > Mar 13 17:06:36.296 [27011] dbg: message: added part, type:
>> > multipart/alternative
>> > Mar 13 17:06:36.299 [27011] dbg: message: found part of type
>> > application/vndms-excel, boundary:
>> > ----=_NextPart_000_004F_0181A2CA.182A5CF0
>> > Mar 13 17:06:36.299 [27011] dbg: message: added part, type:
>> > application/vndms-excel
>> > Mar 13 17:06:36.299 [27011] dbg: message: parsing multipart, got
>> boundary:
>> > ----=_NextPart_001_034A_0181A2CA.182A5CF0
>> > Mar 13 17:06:36.300 [27011] dbg: message: found part of type
>> text/plain,
>> > boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
>> > Mar 13 17:06:36.300 [27011] dbg: message: added part, type: text/plain
>> > Mar 13 17:06:36.301 [27011] dbg: message: found part of type text/html,
>> > boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
>> > Mar 13 17:06:36.301 [27011] dbg: message: added part, type: text/html
>> > Mar 13 17:06:36.301 [27011] dbg: message: parsing normal part
>> > Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
>> > Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
>> > Mar 13 17:06:36.302 [27011] dbg: message: ---- MIME PARSER END ----
>> > Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
>> > Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
>> > Mar 13 17:06:36.310 [27011] dbg: textcat: classifying, skipping: yi sco
>> lv
>> > is bs sl la ga sa eu et rm cy eo fy gd lt
>> > Mar 13 17:06:36.328 [27011] dbg: textcat: can't determine language
>> > uniquely enough
>> > Mar 13 17:06:36.328 [27011] dbg: textcat: X-Languages: "",
>> > X-Languages-Length: 671 		 	   		  
>> > 
>> 
>> 
>> 
>> Looks like textcat is not working properly if the message is encoded. For
>> the mail you posted on pastebin, textcat guessed "ja.shift-jis" which
>> then
>> triggered UNWANTED_LANGUAGE_BODY.
>> 
>> However, for other chinese spam that got through these days it was either
>> not able to guess the language or it even guessed "en" as language.
>> 
>> Is this a general problem with SpamAssassin not really able to decode
>> that
>> sort of mails?
> 
> 
> Atleast try 3.3.2 since it has textcat fixes.
> (that pastebin shows 3.3.1 as version)
> 
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6229
> 
> 
> 


We are already on 3.3.2. I've attached the sample mail on pastebin:
http://pastebin.com/mcNFUrEs

debug info:
Tue Mar 13 15:09:08 2012 [-6864] dbg: textcat: classifying, skipping: yi sco
lv is bs sl la ga sa eu et rm cy fy eo lt gd
Tue Mar 13 15:09:08 2012 [-6864] dbg: textcat: language possibly: en
Tue Mar 13 15:09:08 2012 [-6864] dbg: textcat: X-Languages: "en",
X-Languages-Length: 3131

Can you also verify that David's #Chinese spams rule doesn't trigger on that
one?
-- 
View this message in context: http://old.nabble.com/Help-with-blocking-Chinese-Spam-tp33493147p33494556.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Mime
View raw message