james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tony Zakula <tonyzak...@gmail.com>
Subject Re: Headless mail renderer
Date Fri, 06 May 2011 11:05:56 GMT
Hey,

That is a cool project!  Congratulations!  I have one where that I am
still polishing for release that transforms messages into JSON format
and then stores the JSON.  My initial benchmarks on non-optimized code
is an average of 25,000 messages an hour with the main bottle neck
being the IO.  Cool to see what other people are doing.

Tony Z


On Fri, May 6, 2011 at 3:35 AM, Noss Benoit <benoit.noss@secu.lu> wrote:
> Hello Stefano and all the other who helped me,
>
> I worked with two students on a headless mail renderer (written in JAVA)
> I recently opened a project on SourceForge to share this experience
> (http://sourceforge.net/projects/mailtopdf/)
>
> Purpose is to render allmost all mails (body + attachments) into one or more
> PDFs. Focus was not set on a "sexy" rendition but on a rendition at all.
> Mails are read through imap or from a directory, renderer and saved as PDF
> in an output directory. It uses OpenOffice and JAI in background (for the
> attachments)
> I'm quite happy with the first results : it renders 98% of the mails with
> their attachments (mean pdf rendition value per mail =300ms on a normal
> machine)
>
> Just to let you know it and to thank again
>
>
> Benońęt NOSS
>
>
>
>
> On 25.01.2011 10:30, Stefano Bagnara wrote:
>>
>> 2011/1/25 Noss Benoit<benoit.noss@secu.lu>:
>>>
>>> Hi, after your comments, I know think I have to split my project in two
>>> parts
>>>
>>> 1/ The first part has to parse the message and write an html or xhtml
>>> page
>>> representing the output I want for the message
>>> 2/ The second part has to render the html I precedently generated to PDF
>>
>> I do that in a single step because of the content-id "cid:" image
>> references.
>> BTW logically you need to separate components: parser and renderer.
>>
>>> I tried flying saucer in the past, it can generate PDF, but it needed
>>> strict
>>> XHTML for the input, and lots of mails are not strict XHTML
>>
>> I've had very good results parsing the html with validator.nu parser:
>> http://about.validator.nu/htmlparser/
>>
>> I parsed thousands of HTML email and tested most html parser out there
>> and validator.nu was the only one parsing them all.
>>
>>> On the one hand, I think I can improve my parser to get the html I want
>>> for
>>> most of the mails I have to transform.
>>> On the other hand, I don't know the openoffice SDK, webkit and Mozilla,
>>> and
>>> html rendering will be the hardest part....
>>
>> If you used flying saucer in past then go ahead with that.
>>
>> Stefano
>>
>
>
>
>
>
>

Mime
View raw message