james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Wiederkehr <markus.wiederk...@gmail.com>
Subject Re: Reading iso-8859-1 TextPart content
Date Sun, 17 May 2009 19:46:09 GMT
On Fri, May 15, 2009 at 8:54 PM, Alejandro Valdez
<alejandro.valdez@gmail.com> wrote:
> On 5/15/09, Markus Wiederkehr <markus.wiederkehr@gmail.com> wrote:
>> On Fri, May 15, 2009 at 12:02 AM, Alejandro Valdez
>> <alejandro.valdez@gmail.com> wrote:
>>> Hi list, I'm using mime4j to extract the text content from the
>>> e-mail's text/html parts, I
>>> found that sometimes there are non-standard MIME parts that use
>>> iso-8859-1 characters (i.e.
>>> accented vowels) but don't declare any charset in the part's MIME header.
>>>
>>> In that cases I found that mime4j creates a Reader that uses us-ascii
>>> as the charset (that is what
>>> should be done when there is no charset declaration in the header).
>>> Reading the content from that
>>> Reader produces char[] with the unicode FFFD symbol in replacement of
>>> the non us-ascii characters.
>>>
>>> Do anyone know some way to use the mime4j API to return a Reader with
>>> iso-8859-1 charset set,
>>> or some other solution to this (maybe common) problem?
>>
>> I looks indeed like this is not possible.
>>
>> For Mime4j 0.7 I would propose that we pull up getInputStream() from
>> BinaryBody to SingleBody so that TextBody gets this method too.
>>
>> If that's okay I can open a JIRA and fix the issue.
>>
>>> This is the way I'm reading a TextPart content:
>>>
>>> TextBody textBody = (TextBody) part.getBody();
>>> Reader reader = textBody.getReader();
>>> char[] buffer = new char[16000];
>>> StringBuilder sb = new StringBuilder();
>>>
>>> int bytesReaded = 1;
>>> while (bytesReaded != -1) {
>>>  bytesReaded = reader.read(buffer,0,buffer.length);
>>>  if(bytesReaded != -1) {
>>>    sb.append(buffer,0,bytesReaded);
>>>        }
>>> }
>>> return sb.toString();
>>
>> Looks like you want to convert the TextBody to a String.. How about this:
>>
>>         TextBody textBody = (TextBody) part.getBody();
>>         ByteArrayOutputStream baos = new ByteArrayOutputStream();
>>         textBody.writeTo(baos);
>>         return new String(baos.toByteArray(), "iso-8859-1");
>>
>> hth
>> Markus
>>
>
> Hello Markus, thank you (very much) for your help, your snippet works
> great: it creates a String with all the characters (bytes) in the MIME
> TextPart.
>
> I'm curious about how the wirteTo() method actually works, I looked at
> the mime4j 0.6 source code SingleBody.java and TextPart.java  (at
> src\main\java\org\apache\james\mime4j\message) but I couldn't find the
> implementation of this method, please can you point me in the right
> direction?

The method is implemented in StorageTextBody and StringTextBody. Your
TextBody is probably an instance of StorageTextBody so this is where
you want to have a look at.

Cheers,
Markus

PS: If you work with Eclipse you can open the Type Hierarchy (F4) to
figure out things like that..

Mime
View raw message