james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sharma, Ashish" <ashish.shar...@hp.com>
Subject Character corruption with Traditional chinese
Date Mon, 13 Feb 2012 11:32:20 GMT
Hi,

I use mime4j 0.7.2 for email parsing.

I am getting problem of character set corruption for Traditional Chinese characters.

Sample email that is creating problems is at:

http://pastebin.com/Q38VXsLb

Here I noticed that when the email is parsed with default charset encoding (charset encoding
that was recived from email server) of :

charset="gb2312"

I get the character set corruption, while if I manually change this charset encoding in the
email stream to :

charset="gb18030"

and then parse it via mime4j, there is no character corruption.

Can somebody please explain why I am getting this behavior?

Moreover is there a way in mime4j where I can substitute character sets for the above kind
of specific cases?

Thanks
Ashish
 



Mime
View raw message