james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noss Benoit <benoit.n...@secu.lu>
Subject Re: Headless mail renderer
Date Fri, 06 May 2011 11:29:58 GMT
Hi Eric,

    * OpenOffice is used to render MicrosoftOffice and OpenOffice
      attachments into PDF
      OpenOffice badly renders html into PDF
    * iText is used to render XHTML to PDF.
      Like Stefano proposed, render html into XHTML with nu.validator
      (or with jtidy in my case) and then use flying saucer to make a
      PDF out of XHTML
      The flying saucer project internally uses iText 2.x (2.0.8 in my
      case) + iText5.0.6


On 06.05.2011 11:54, Eric Charles wrote:
> I Benoït,
> Many tks for feedback and contribution.
> I just downloaded your zip and saw jodconverter (and associated 
> uno..., ju.. jars from openoffice sdk) and itext libs.
> You also import jdoconverter and itext classes in PDFConverterJAVA.
> What would you advice for any html/text pdf convertion based on your 
> experience?
> Tks,
> - Eric
> On 6/05/2011 10:35, Noss Benoit wrote:
>> Hello Stefano and all the other who helped me,
>> I worked with two students on a headless mail renderer (written in JAVA)
>> I recently opened a project on SourceForge to share this experience
>> (http://sourceforge.net/projects/mailtopdf/)
>> Purpose is to render allmost all mails (body + attachments) into one or
>> more PDFs. Focus was not set on a "sexy" rendition but on a rendition at
>> all. Mails are read through imap or from a directory, renderer and saved
>> as PDF in an output directory. It uses OpenOffice and JAI in background
>> (for the attachments)
>> I'm quite happy with the first results : it renders 98% of the mails
>> with their attachments (mean pdf rendition value per mail =300ms on a
>> normal machine)
>> Just to let you know it and to thank again
>> Benoît NOSS
>> On 25.01.2011 10:30, Stefano Bagnara wrote:
>>> 2011/1/25 Noss Benoit<benoit.noss@secu.lu>:
>>>> Hi, after your comments, I know think I have to split my project in 
>>>> two
>>>> parts
>>>> 1/ The first part has to parse the message and write an html or xhtml
>>>> page
>>>> representing the output I want for the message
>>>> 2/ The second part has to render the html I precedently generated 
>>>> to PDF
>>> I do that in a single step because of the content-id "cid:" image
>>> references.
>>> BTW logically you need to separate components: parser and renderer.
>>>> I tried flying saucer in the past, it can generate PDF, but it needed
>>>> strict
>>>> XHTML for the input, and lots of mails are not strict XHTML
>>> I've had very good results parsing the html with validator.nu parser:
>>> http://about.validator.nu/htmlparser/
>>> I parsed thousands of HTML email and tested most html parser out there
>>> and validator.nu was the only one parsing them all.
>>>> On the one hand, I think I can improve my parser to get the html I
>>>> want for
>>>> most of the mails I have to transform.
>>>> On the other hand, I don't know the openoffice SDK, webkit and
>>>> Mozilla, and
>>>> html rendering will be the hardest part....
>>> If you used flying saucer in past then go ahead with that.
>>> Stefano


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message