james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Valdez <alejandro.val...@gmail.com>
Subject Re: Reading iso-8859-1 TextPart content
Date Fri, 15 May 2009 18:54:00 GMT
On 5/15/09, Markus Wiederkehr <markus.wiederkehr@gmail.com> wrote:
> On Fri, May 15, 2009 at 12:02 AM, Alejandro Valdez
> <alejandro.valdez@gmail.com> wrote:
>> Hi list, I'm using mime4j to extract the text content from the
>> e-mail's text/html parts, I
>> found that sometimes there are non-standard MIME parts that use
>> iso-8859-1 characters (i.e.
>> accented vowels) but don't declare any charset in the part's MIME header.
>>
>> In that cases I found that mime4j creates a Reader that uses us-ascii
>> as the charset (that is what
>> should be done when there is no charset declaration in the header).
>> Reading the content from that
>> Reader produces char[] with the unicode FFFD symbol in replacement of
>> the non us-ascii characters.
>>
>> Do anyone know some way to use the mime4j API to return a Reader with
>> iso-8859-1 charset set,
>> or some other solution to this (maybe common) problem?
>
> I looks indeed like this is not possible.
>
> For Mime4j 0.7 I would propose that we pull up getInputStream() from
> BinaryBody to SingleBody so that TextBody gets this method too.
>
> If that's okay I can open a JIRA and fix the issue.
>
>> This is the way I'm reading a TextPart content:
>>
>> TextBody textBody = (TextBody) part.getBody();
>> Reader reader = textBody.getReader();
>> char[] buffer = new char[16000];
>> StringBuilder sb = new StringBuilder();
>>
>> int bytesReaded = 1;
>> while (bytesReaded != -1) {
>>  bytesReaded = reader.read(buffer,0,buffer.length);
>>  if(bytesReaded != -1) {
>>    sb.append(buffer,0,bytesReaded);
>>        }
>> }
>> return sb.toString();
>
> Looks like you want to convert the TextBody to a String.. How about this:
>
>         TextBody textBody = (TextBody) part.getBody();
>         ByteArrayOutputStream baos = new ByteArrayOutputStream();
>         textBody.writeTo(baos);
>         return new String(baos.toByteArray(), "iso-8859-1");
>
> hth
> Markus
>

Hello Markus, thank you (very much) for your help, your snippet works
great: it creates a String with all the characters (bytes) in the MIME
TextPart.

I'm curious about how the wirteTo() method actually works, I looked at
the mime4j 0.6 source code SingleBody.java and TextPart.java  (at
src\main\java\org\apache\james\mime4j\message) but I couldn't find the
implementation of this method, please can you point me in the right
direction?

Mime
View raw message