james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ioan Eugen STAN <stan.ieu...@gmail.com>
Subject Re: NIO Iterator over messages in mbox file
Date Sat, 11 Feb 2012 14:05:53 GMT
On Sb 11 feb 2012 00:15:57 +0200, Markus Wiederkehr wrote:
> Hi Ioan,
>
> Mime4j's BufferedLineReaderInputStream bridges the gap between byte and
> character streams. It lets you read lines of text from a byte stream into a
> ByteArrayBuffer. Then you can use class ContentUtil to decode the
> ByteArrayBuffer into a String. You can also push back (unread) content.
> Maybe that helps with your project.
>
> Cheers,
> Markus

Thanks for clarifying Markus.

The only thing I'm not sure of right now is whether the mbox file has
one charset. It should be, because multi-charset text files are kind of
weird and would be very problematic (and I never heard of before). But I
am uncertain because messages can have an encoding specified with
Content-encoding header.

>From what you said, mime4j uses a charset per message because it doesn't
assume that all messages are part of a single file with one encoding.

I will update the code to provide for means of creating an iterator for
which you can specify:
- file charset
- From_ line regex
- sensible defaults otherwise.

After this, I'll find a place to plug it in mime4j.

Thanks,
--
Ioan Eugen Stan







Mime
View raw message