spamassassin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Motoharu Kubo <>
Subject Re: I18n and l10n
Date Tue, 17 Jan 2006 14:33:59 GMT
> There are two possibilities.
> (1) rewrite from BODY to RAWBODY as Matsuda-san says.
> (2) invent NBODY (or something else) apart from BODY.  NBODY contains
>      normalized and tokenized version of body.  I once thought of this
>      idea but did not propose because BODY has problems I mentioned
>      above and overhead of executing nbody_test increases.

There is third method.

rawbody  SJIS_BODY  eval:check_charset("Shift_JIS")
describe SJIS_BODY  Mail text is encoded with Shift JIS
score    SJIS_BODY 1.4

rawbody  JIS_BODY   eval:check_charset("ISO-2022-JP")
describe JIS_BODY   Mail text is encoded with JIS
score    JIS_BODY   -0.5

check_charset is a function that detect charset of rawbody using 
Encode::Detect::Encoder::detect.  I don't write this function yet though.

Motoharu Kubo

View raw message