james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noss Benoit <benoit.n...@secu.lu>
Subject Re: Headless mail renderer
Date Tue, 25 Jan 2011 06:34:01 GMT
Hi, after your comments, I know think I have to split my project in two 

1/ The first part has to parse the message and write an html or xhtml 
page representing the output I want for the message
2/ The second part has to render the html I precedently generated to PDF

I tried flying saucer in the past, it can generate PDF, but it needed 
strict XHTML for the input, and lots of mails are not strict XHTML
On the one hand, I think I can improve my parser to get the html I want 
for most of the mails I have to transform.
On the other hand, I don't know the openoffice SDK, webkit and Mozilla, 
and html rendering will be the hardest part....


On 24.01.2011 18:16, Eric Charles wrote:
> Hi,
> fyi
> I also used java/mozilla integration via javaxpcom which needs 
> investment from developer (API changes,...). An alternative is to use 
> an html to pdf add-on and call it from xul with a java/xulrunner 
> integration.
> I also used Flying Saucer but didn't know it was able to generate PDF.
> For your use case, there's also the openoffice SDK which is really 
> well documented and supports a wide range of input/output document 
> format (html, pdf,...).
> Tks,
> Eric
> On 24/01/2011 15:09, Noss Benoit wrote:
>> thanks for your comments Stefano, I will look in the directions you 
>> suggested and keep you informed (if you want to)
>> Benoît
>> On 24.01.2011 11:57, Stefano Bagnara wrote:
>>> 2011/1/24 Noss Benoit<benoit.noss@secu.lu>:
>>>> Hi Stefano,
>>>> thanks for your answer. In the past, I already tried to do this 
>>>> with the
>>>> javax.mail.Message class.
>>>> it was not a big success..., and found lots of issues due to the 
>>>> variety of
>>>> incoming mails, so couldn't get in production.
>>> You can tweak javamail with some system property to let it parse some
>>> more malformed message.
>>> I say this because I think javamail is ok for this work, too.
>>> Mime4j may be a little simpler, but I'm not sure it worth porting your
>>> code if you already have javamail code ready.
>>> With both you will have anyway to manually deal with mime parts and
>>> decide what to do with each part (mime4j removes the complexity of the
>>> activation framework and automatic object decoding done by javamail).
>>>> With each parsed Message, I tried to build in parallel a xhtml page
>>>> representing its content (From: To: Subject: Date: and body content)
>>>> When the attachement was a message, I recursively went into it and 
>>>> appended
>>>> info found in the xhtml I previously created
>>>> When I found html, I tried to transform it to XHTML with tidy, then 
>>>> to PDF
>>>> with iText
>>>>                                     when XHTML transformation 
>>>> failed and had
>>>> a multipart/alternative, I then rendered txt to PDF
>>>> When I found attached images, I rendered them to PDF
>>>> When I found office documents I didn't transform them
>>>> After that I merged all created PDF in one big PDF and checked it 
>>>> in to
>>>> Documentum DB (for one message, one pdf)
>>> For xhtml to pdf rendering you may want to evaluate xhtmlrenderer (aka
>>> Flying Saucer).
>>> It is the best pure java xhtml renderer out there: it is not near to
>>> real web browsers but much better than other java rendering I tested.
>>>> The aim of the project is not to have a pretty rendering of all 
>>>> mail, it's
>>>> just to keep track of messages our client sent.
>>>> I faced three big issues :
>>>> **************************
>>>> 0/ multipart/mixed with inline image content in "cid:...."
>>> Sure, you have to do manual work with this. Look for parts with
>>> Content-ID and alter references in the html urls to link to this
>>> objects.
>>> Depending on your rendering engine you should be able to plug your own
>>> url resolver and intercept cid: urls to provide the streams from the
>>> appropriate mime parts (I do that using Flying Sourcer)
>>>> 1/ like you said html to pdf rendering is difficult and (tidy+iText or
>>>> multipart/alternative) was not always working.
>>>>     If only I could use the Mozilla components to render it, but my
>>>> understanding of it is not high enough
>>> You can use mozilla components or even webkit: just google and you
>>> will find informations. I preferred Flying Saucer because I don't want
>>> to run X (even xvfb) on my servers for this task.
>>>> 2/ Special caracters and encoding pb in headers and attached file 
>>>> names
>>> I've had issues only with oriental encodings: they are difficult to
>>> support in flying saucer. No problems with european encodings.
>>> Stefano


View raw message