spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From RW <rwmailli...@googlemail.com>
Subject Re: normalize_charset effects
Date Thu, 15 Nov 2018 00:31:45 GMT
On Wed, 14 Nov 2018 19:32:00 +0100
Matus UHLAR - fantomas wrote:

> >On Wed, 14 Nov 2018 09:43:25 +0100
> >Matus UHLAR - fantomas wrote:  
> >> what are direct effects of normalize_charset?  
> 
> On 14.11.18 14:37, RW wrote:
> >It causes mime text parts that aren't UTF-8 to be translated into
> >UTF-8.  
> 
> does this apply only for rules or even for things like bayes?
> 
> I mean, when a iso-8859-* word is already tokenized in bayes, will it
> be missed?

It only makes a difference for words that contain non-ascii
characters.  Without normalize_charset such words are treated as
separate tokens in UTF-8 and  iso-8859 with each having its own counts.
With 'normalize_charset 1' the UTF-8 version is used and the iso-8859
tokens should age out.


> now a question raised if non-UTF8 spam will be caught more likely when
> normalizing...

It's hard to say. I doubt it makes much difference unless you are
using third-party, or local rules that are written for UTF-8. The two
best reasons for setting it are that it simplifies local rule writing
and it increases Bayes retention.

Mime
View raw message