spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Kettler <mkettler...@verizon.net>
Subject Re: Error while sa-learning
Date Tue, 10 Jun 2008 12:38:13 GMT
Diego Pomatta wrote:
> (again as new mail)
> Hey list,
>
> I get lots of these errors while passing a mbox file to sa-learn for 
> spam learning:
>
> Malformed UTF-8 character (unexpected non-continuation byte 0x72, 
> immediately after start byte 0xf3) in transliteration (tr///) at 
> /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/Message.pm line 1049.
> Malformed UTF-8 character (unexpected non-continuation byte 0x20, 
> immediately after start byte 0xe1) in transliteration (tr///) at 
> /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/Message.pm line 1050.
>
> with variations in non-continuation byte and start byte, but all in 
> lines 1049 and 1059 of Message.pm
> The process finishes well and tokens are learned, so I assume it's 
> some of the messages within the mbox file that are somehow corrupted.
> It started today after I added a bunch of new spammy msgs I collected.
> What does the error mean and how can I identify the mails with the 
> problem?
What perl version are you running? I suspect this appears to be related 
to a common bug in perl 5.8.6

It can be kludged with a "use bytes" added to message.pm, but that hurts 
performance a bit.

See also:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=3787

(note: that bug is actually about it cropping up in rules, but it is 
likely the same root cause unless you're running perl 5.8.8)

>
>


Mime
View raw message