james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tze-Kei Lee <chi...@gmail.com>
Subject Re: Character corruption with Traditional chinese
Date Mon, 13 Feb 2012 12:15:03 GMT
Hi,

It looks like the email client composed the email made mistake when
pick charset.

GB 2312 contains only Simplified Chinese while CP 932 or GB 18030 is
extended to include Traditional Chinese (and Japanese, Korean), and
the first sentence in the email is using the extended code points.

Best Regards

Tze-Kei

On Mon, Feb 13, 2012 at 7:32 PM, Sharma, Ashish <ashish.sharma3@hp.com> wrote:
> Hi,
>
> I use mime4j 0.7.2 for email parsing.
>
> I am getting problem of character set corruption for Traditional Chinese characters.
>
> Sample email that is creating problems is at:
>
> http://pastebin.com/Q38VXsLb
>
> Here I noticed that when the email is parsed with default charset encoding (charset encoding
that was recived from email server) of :
>
> charset="gb2312"
>
> I get the character set corruption, while if I manually change this charset encoding
in the email stream to :
>
> charset="gb18030"
>
> and then parse it via mime4j, there is no character corruption.
>
> Can somebody please explain why I am getting this behavior?
>
> Moreover is there a way in mime4j where I can substitute character sets for the above
kind of specific cases?
>
> Thanks
> Ashish
>
>
>

Mime
View raw message