james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Valdez <alejandro.val...@gmail.com>
Subject Reading iso-8859-1 TextPart content
Date Thu, 14 May 2009 22:02:15 GMT
Hi list, I'm using mime4j to extract the text content from the
e-mail's text/html parts, I
found that sometimes there are non-standard MIME parts that use
iso-8859-1 characters (i.e.
accented vowels) but don't declare any charset in the part's MIME header.

In that cases I found that mime4j creates a Reader that uses us-ascii
as the charset (that is what
should be done when there is no charset declaration in the header).
Reading the content from that
Reader produces char[] with the unicode FFFD symbol in replacement of
the non us-ascii characters.

Do anyone know some way to use the mime4j API to return a Reader with
iso-8859-1 charset set,
or some other solution to this (maybe common) problem?


This is the way I'm reading a TextPart content:

TextBody textBody = (TextBody) part.getBody();
Reader reader = textBody.getReader();
char[] buffer = new char[16000];
StringBuilder sb = new StringBuilder();
		
int bytesReaded = 1;
while (bytesReaded != -1) {
  bytesReaded = reader.read(buffer,0,buffer.length);
  if(bytesReaded != -1) {
    sb.append(buffer,0,bytesReaded);
	}
}
return sb.toString();

Mime
View raw message