james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johannes Zillmann <jzillm...@googlemail.com>
Subject parsing mbox fiels with mime4j
Date Thu, 03 Jun 2010 11:38:25 GMT
Hi,

i'm trying to parse this mbox file http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200602
with mime4j with 0.6 version.
The parsing code is like this:
--------------------------
org.apache.james.mime4j.parser.MimeTokenStream stream = new MimeTokenStream();
BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream("/Users/jz/Documents/workspace/ms/dap/modules/dap-conductor/src/data/mbox/200602"));
while (bufferedInputStream.available() > 0) {
     stream.parse(bufferedInputStream);
     handleParse(stream);
     System.out.println("---------------------------------------------");
}
--------------------------

Some messages seems to be parsed correctly, but sometime the parser ends a message in the
middle of a body and starts the next one.

A mid of a body:
--------------------------
Context.java:266)
	at
org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationContex
t.java:449)
	at org.mortbay.util.Container.start(Container.java:72)
	at org.mortbay.http.HttpServer.doStart(HttpServer.java:753)
	at org.mortbay.util.Container.start(Container.java:72)
	at
org.apache.hadoop.mapred.JobTrackerInfoServer$HTTPStarter.run(JobTrackerInfo
Server.java:101)
--------------------------

The next field:
--------------------------
FIELD: ainer.start(Container.java:	72)
	at org.mortbay.http.HttpServer.doStart(HttpServer.java:753)
	at org.mortbay.util.Container.start(Container.java:72)
	at
--------------------------

Is mime4j apropriate to parse mbox format ? Is there any configuration or trick which can
help me here ?

best regards
Johannes


Mime
View raw message