corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kelly <>
Subject Re: Apache™ PDFBox™ named an Open Source Partner Organization of the PDF Association : The Apache Software Foundation Blog
Date Wed, 04 Feb 2015 16:40:28 GMT
> On 4 Feb 2015, at 5:47 pm, Edward Zimmermann <> wrote:
> Does this have anything to do with Corinthia? No. Corinthia is about content and especially
word processing formats (OOXML, ODF etc.).. Corinthia is at its core about pragmatic fidelity.
The point of the bidirectional transformation model is to be able to reduce fidelity demands.
Unless the project wants to get sidetracked into HiFi rendering (of DOCX or ODT) it's completely
outside of the scope….

I think of PDF in the same way as I do PNG. It’s intended as an output format, not an input
format. I know there are tools out there which are effectively half of an OCR system which
can reconstruct a source document by inferring the logical structure from the layout (e.g.
where a paragraph begins and ends), though this is quite a difficult problem and I’m not
sure that it’d be within the scope of Corinthia (though if someone has ideas on this and
wants to work on it, I’m all for it - it’s just a very difficult and very different task
to writing filters for all the other formats we’ve discussed).

On the other side is output to PDF - that is, typesetting. This is something I also think
would be outside the scope of the project (at least based on my understanding of people’s
interests to date). We basically rely on separate programs to do the typesetting of a document
produced by the library, e.g. LaTeX, WebKit/other browser engines.

Dr. Peter M. Kelly

PGP key: <>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message