james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sharma, Ashish" <ashish.shar...@hp.com>
Subject Mime word not getting decoded using mime4j
Date Tue, 02 Aug 2011 09:12:09 GMT
Hi,

I am trying to decode mime words (the original string is in Chinese characters) using DecoderUtil.decodeEncodedWords().

Following is the sample code :

@Test
	public void testEncoding() throws UnsupportedEncodingException, IOException{		
		String str = "=?gb2312?B?ztKyu8rH1tCH+LmyrmEudHh0?=";
		str = str + "\r\n ";
		str = str + "=?gb2312?B?ztLKx9bQufrIyy50eHQ=?=";
		str = DecoderUtil.decodeEncodedWords(str);		
		File file = new File("C://chinese2.txt");		
		FileOutputStream fileOut = new FileOutputStream(file);
		fileOut.write(str.getBytes("gb2312"));
		fileOut.flush();
		fileOut.close();		
			
	}

In above code the characters would seem to be corrupted.

Here the problem is with the character set, most of the mail clients set the char sets to
be GB2312, but actually to decode the chars correctly I had to use GB18030 in the above code.
(Refer this for more info: http://stackoverflow.com/questions/3856920/character-corruption-for-chinese-simple-and-traditional-and-korean-texts)

Following is the generalization that I had made to replace character sets sent by mail clients
for correct decoding of characters :

1. For any of following Chinese char set:

	iso-ir-58,chinese,gbk,cn-gb,csgb2312,csiso58gb231280,euc-cn,euc_cn,euccn,gb2312,gb_2312-80,x-EUC-CN,gb2312-1980,gb2312-80

	replace it with : GB18030

2. For any of the following Korean char set:

	5601,ksc5601-1987,ksc5601_1987,euckr,ksc5601,ksc_5601,euc_kr,csEUCKR,ks_c_5601-1987

	replace it with :EUC-KR

3. for any of the following Taiwanese char set:

	ms-874\,ms874\,windows-874\,cp874\,874\,cs874\,ibm874

	replace it with : TIS-620
	

I suggest that in the "DecoderUtil.decodeEncodedWords()" method itself charset fallback should
be provided.

For more info, refer http://wiki.whatwg.org/wiki/Web_Encodings also.

Please reply your comments.

Thanks
Ashish Sharma

Mime
View raw message