corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Louis S <>
Subject Re: Apache™ PDFBox™ named an Open Source Partner Organization of the PDF Association : The Apache Software Foundation Blog
Date Wed, 04 Feb 2015 18:51:21 GMT
I posted on this to see if pdfbox could offer insight s it is taken up. Dave pointed out that
the functionality of pdfbox ws interesting to his company.


> On 4 Feb 2015, at 12:03, jan i <> wrote:
> On Wednesday, February 4, 2015, Peter Kelly <> wrote:
>>> On 4 Feb 2015, at 5:47 pm, Edward Zimmermann <
>> <javascript:;>> wrote:
>>> Does this have anything to do with Corinthia? No. Corinthia is about
>> content and especially word processing formats (OOXML, ODF etc.)..
>> Corinthia is at its core about pragmatic fidelity. The point of the
>> bidirectional transformation model is to be able to reduce fidelity
>> demands. Unless the project wants to get sidetracked into HiFi rendering
>> (of DOCX or ODT) it's completely outside of the scope….
>> I think of PDF in the same way as I do PNG. It’s intended as an output
>> format, not an input format. I know there are tools out there which are
>> effectively half of an OCR system which can reconstruct a source document
>> by inferring the logical structure from the layout (e.g. where a paragraph
>> begins and ends), though this is quite a difficult problem and I’m not sure
>> that it’d be within the scope of Corinthia (though if someone has ideas on
>> this and wants to work on it, I’m all for it - it’s just a very difficult
>> and very different task to writing filters for all the other formats we’ve
>> discussed).
> +1 I think we currently have other more important tasks in corinthia.
> rgds
> jan i
>> On the other side is output to PDF - that is, typesetting. This is
>> something I also think would be outside the scope of the project (at least
>> based on my understanding of people’s interests to date). We basically rely
>> on separate programs to do the typesetting of a document produced by the
>> library, e.g. LaTeX, WebKit/other browser engines.
>> --
>> Dr. Peter M. Kelly
>> <javascript:;>
>> PGP key: <>
>> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)
> -- 
> Sent from My iPad, sorry for any misspellings.

View raw message