cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Conal Tuohy <con...@paradise.net.nz>
Subject RE: Custom extensions - to be made available if possible
Date Fri, 10 Sep 2004 08:07:16 GMT
Antonio Fiol Bonnin wrote:

> Thank you, Con, for your very interesting point of view. We were
> working on (a) but I have told my team that we will be changing
> approach in one hour if they do not see a clear end.
>
> Other than that, I will look into pdftohtml (is it really html?).

http://pdftohtml.sourceforge.net/

It can produce HTML or XML. The XML is closer in form to the content of the
PDF - it has pages containing text with typographic and positional
formatting. The HTML has some of the formatting information removed (I
think) and some kind of guess-work is used to stick lines of text back into
paragraphs.


Mime
View raw message