cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Conal Tuohy <>
Subject RE: Custom extensions - to be made available if possible
Date Fri, 10 Sep 2004 08:13:34 GMT
Stefano Mazzocchi wrote:

> What about using XSL:FO? Would be pretty cool to have the ability to
> transform PDF into FO, basically reversing what FOP does. I know it
> would be pretty painful to make it work with all kinds of
> PDF, but for
> reasonable ones it shouldn't be that hard (PDF is sort of a markup
> language already).

It would be cool, but sadly I think the PDF format usually has too much
information thrown away - there's no concept of a "flow" of text, or even a
paragraph! I think SVG (or a subset of course) would be a better match than
FO. In "tagged" PDF there's more information, but most PDF files have a very
much simpler structure, of disconnected lines of text, positioned at
particular locations on a page. I think the DTD I quoted actually covers
most of what you could extract from most PDF files. :-(

View raw message