james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Bagnara <apa...@bago.org>
Subject Re: Headless mail renderer
Date Mon, 24 Jan 2011 10:57:02 GMT
2011/1/24 Noss Benoit <benoit.noss@secu.lu>:
> Hi Stefano,
> thanks for your answer. In the past, I already tried to do this with the
> javax.mail.Message class.
> it was not a big success..., and found lots of issues due to the variety of
> incoming mails, so couldn't get in production.

You can tweak javamail with some system property to let it parse some
more malformed message.
I say this because I think javamail is ok for this work, too.
Mime4j may be a little simpler, but I'm not sure it worth porting your
code if you already have javamail code ready.

With both you will have anyway to manually deal with mime parts and
decide what to do with each part (mime4j removes the complexity of the
activation framework and automatic object decoding done by javamail).

> With each parsed Message, I tried to build in parallel a xhtml page
> representing its content (From: To: Subject: Date: and body content)
> When the attachement was a message, I recursively went into it and appended
> info found in the xhtml I previously created
> When I found html, I tried to transform it to XHTML with tidy, then to PDF
> with iText
>                                    when XHTML transformation failed
and had
> a multipart/alternative, I then rendered txt to PDF
> When I found attached images, I rendered them to PDF
> When I found office documents I didn't transform them
> After that I merged all created PDF in one big PDF and checked it in to
> Documentum DB (for one message, one pdf)

For xhtml to pdf rendering you may want to evaluate xhtmlrenderer (aka
Flying Saucer).
It is the best pure java xhtml renderer out there: it is not near to
real web browsers but much better than other java rendering I tested.

> The aim of the project is not to have a pretty rendering of all mail, it's
> just to keep track of messages our client sent.
> I faced three big issues :
> **************************
> 0/ multipart/mixed with inline image content in "cid:...."

Sure, you have to do manual work with this. Look for parts with
Content-ID and alter references in the html urls to link to this
Depending on your rendering engine you should be able to plug your own
url resolver and intercept cid: urls to provide the streams from the
appropriate mime parts (I do that using Flying Sourcer)

> 1/ like you said html to pdf rendering is difficult and (tidy+iText or
> multipart/alternative) was not always working.
>    If only I could use the Mozilla components to render it, but my
> understanding of it is not high enough

You can use mozilla components or even webkit: just google and you
will find informations. I preferred Flying Saucer because I don't want
to run X (even xvfb) on my servers for this task.

> 2/ Special caracters and encoding pb in headers and attached file names

I've had issues only with oriental encodings: they are difficult to
support in flying saucer. No problems with european encodings.


View raw message