james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Wiederkehr <markus.wiederk...@gmail.com>
Subject Re: Reading iso-8859-1 TextPart content
Date Fri, 15 May 2009 15:35:25 GMT
On Fri, May 15, 2009 at 12:02 AM, Alejandro Valdez
<alejandro.valdez@gmail.com> wrote:
> Hi list, I'm using mime4j to extract the text content from the
> e-mail's text/html parts, I
> found that sometimes there are non-standard MIME parts that use
> iso-8859-1 characters (i.e.
> accented vowels) but don't declare any charset in the part's MIME header.
> In that cases I found that mime4j creates a Reader that uses us-ascii
> as the charset (that is what
> should be done when there is no charset declaration in the header).
> Reading the content from that
> Reader produces char[] with the unicode FFFD symbol in replacement of
> the non us-ascii characters.
> Do anyone know some way to use the mime4j API to return a Reader with
> iso-8859-1 charset set,
> or some other solution to this (maybe common) problem?

I looks indeed like this is not possible.

For Mime4j 0.7 I would propose that we pull up getInputStream() from
BinaryBody to SingleBody so that TextBody gets this method too.

If that's okay I can open a JIRA and fix the issue.

> This is the way I'm reading a TextPart content:
> TextBody textBody = (TextBody) part.getBody();
> Reader reader = textBody.getReader();
> char[] buffer = new char[16000];
> StringBuilder sb = new StringBuilder();
> int bytesReaded = 1;
> while (bytesReaded != -1) {
>  bytesReaded = reader.read(buffer,0,buffer.length);
>  if(bytesReaded != -1) {
>    sb.append(buffer,0,bytesReaded);
>        }
> }
> return sb.toString();

Looks like you want to convert the TextBody to a String.. How about this:

        TextBody textBody = (TextBody) part.getBody();
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        return new String(baos.toByteArray(), "iso-8859-1");


View raw message