Return-Path: X-Original-To: apmail-corinthia-dev-archive@minotaur.apache.org Delivered-To: apmail-corinthia-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7182F17652 for ; Thu, 5 Feb 2015 11:15:44 +0000 (UTC) Received: (qmail 74092 invoked by uid 500); 5 Feb 2015 11:15:44 -0000 Delivered-To: apmail-corinthia-dev-archive@corinthia.apache.org Received: (qmail 74067 invoked by uid 500); 5 Feb 2015 11:15:44 -0000 Mailing-List: contact dev-help@corinthia.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@corinthia.incubator.apache.org Delivered-To: mailing list dev@corinthia.incubator.apache.org Received: (qmail 74056 invoked by uid 99); 5 Feb 2015 11:15:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Feb 2015 11:15:44 +0000 X-ASF-Spam-Status: No, hits=-1997.8 required=5.0 tests=ALL_TRUSTED,HTML_MESSAGE,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 05 Feb 2015 11:15:42 +0000 Received: (qmail 69070 invoked by uid 99); 5 Feb 2015 11:14:07 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Feb 2015 11:14:07 +0000 Received: from mail-lb0-f178.google.com (mail-lb0-f178.google.com [209.85.217.178]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id C63FB1A038C for ; Thu, 5 Feb 2015 11:14:06 +0000 (UTC) Received: by mail-lb0-f178.google.com with SMTP id u10so6382617lbd.9 for ; Thu, 05 Feb 2015 03:14:05 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.152.8.1 with SMTP id n1mr2675387laa.47.1423134845598; Thu, 05 Feb 2015 03:14:05 -0800 (PST) Received: by 10.112.210.101 with HTTP; Thu, 5 Feb 2015 03:14:05 -0800 (PST) In-Reply-To: <7EDD320F-667D-4AC6-B266-C019F8956522@comcast.net> References: <729A8170-B03D-4580-AFE5-9BFC475D1659@gmail.com> <005901d03fd4$a0c8df80$e25a9e80$@acm.org> <9ECC80AF-D342-4CCD-99D8-830895639B3B@gmail.com> <6FFFD55D-05C2-4DB2-8023-9AF7BE5B5915@gmail.com> <7EDD320F-667D-4AC6-B266-C019F8956522@comcast.net> Date: Thu, 5 Feb 2015 12:14:05 +0100 Message-ID: Subject: =?UTF-8?Q?Re=3A_Apache=E2=84=A2_PDFBox=E2=84=A2_named_an_Open_Source_Partner?= =?UTF-8?Q?_Organization_of_the_PDF_Association_=3A_The_Apache_Software_F?= =?UTF-8?Q?oundation_Blog?= From: jan i To: "dev@corinthia.incubator.apache.org" Content-Type: multipart/alternative; boundary=001a11c2618cf08044050e55665a X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2618cf08044050e55665a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Wednesday, February 4, 2015, Dave Fisher wrote: > Yes, it is interesting to me. I know that PDF is a markup that is based o= n > a set of PostScript functions and an object layout specification. It is n= ot > like PNG - that's a raster bitmap. It is a vector drawing spec. My intere= st > is pulling out the content - both text and shapes into a useful set of > objects. I am not so interested at this time in other features like forms= , > embedded files, and output. > > I can read the PDF into an object structure and output HTML5. I can also > output the objects into roughly equivalent PPTX slides using Apache POI. > > Corinthia comes in two ways for me. > > (1) An HTML5 format that is targeting interchange with Office Document > formats. > > (2) An intermediate format the may be exported in any format that makes > sense. > > So I am looking for Corinthia to allow pluggable DocFormats. plugable filters is something I tried to persuade peter to earlier, maybe it will be easier when the new core API is ready. rgds jan i > > Regards, > Dave > > On Feb 4, 2015, at 11:13 AM, Louis S wrote: > > > > > > > Louis > > > >> On 4 Feb 2015, at 13:55, jan i > wrote: > >> > >>> On 4 February 2015 at 19:51, Louis S = > > wrote: > >>> > >>> I posted on this to see if pdfbox could offer insight s it is taken u= p. > >>> Dave pointed out that the functionality of pdfbox ws interesting to h= is > >>> company. > >>> > >> > >> And I think your posting was interesting information (such information > is > >> needed to see what moves out there). But I do not think we currently > should > >> think about putting it into Corinthia. > >> > > No objections. > > > >> rgds > >> jan i. > >> > >> > >>> Louis > >>> > >>>> On 4 Feb 2015, at 12:03, jan i > > wrote: > >>>> > >>>> On Wednesday, February 4, 2015, Peter Kelly > wrote: > >>>> > >>>>>> On 4 Feb 2015, at 5:47 pm, Edward Zimmermann < > edward.zimmermann@cib.de > >>>>> > wrote: > >>>>>> > >>>>>> Does this have anything to do with Corinthia? No. Corinthia is abo= ut > >>>>> content and especially word processing formats (OOXML, ODF etc.).. > >>>>> Corinthia is at its core about pragmatic fidelity. The point of the > >>>>> bidirectional transformation model is to be able to reduce fidelity > >>>>> demands. Unless the project wants to get sidetracked into HiFi > rendering > >>>>> (of DOCX or ODT) it's completely outside of the scope=E2=80=A6. > >>>>> > >>>>> I think of PDF in the same way as I do PNG. It=E2=80=99s intended a= s an > output > >>>>> format, not an input format. I know there are tools out there which > are > >>>>> effectively half of an OCR system which can reconstruct a source > >>> document > >>>>> by inferring the logical structure from the layout (e.g. where a > >>> paragraph > >>>>> begins and ends), though this is quite a difficult problem and I=E2= =80=99m > not > >>> sure > >>>>> that it=E2=80=99d be within the scope of Corinthia (though if someo= ne has > ideas > >>> on > >>>>> this and wants to work on it, I=E2=80=99m all for it - it=E2=80=99s= just a very > >>> difficult > >>>>> and very different task to writing filters for all the other format= s > >>> we=E2=80=99ve > >>>>> discussed). > >>>> > >>>> +1 I think we currently have other more important tasks in corinthia= . > >>>> > >>>> > >>>> rgds > >>>> jan i > >>>> > >>>>> > >>>>> On the other side is output to PDF - that is, typesetting. This is > >>>>> something I also think would be outside the scope of the project (a= t > >>> least > >>>>> based on my understanding of people=E2=80=99s interests to date). W= e > basically > >>> rely > >>>>> on separate programs to do the typesetting of a document produced b= y > the > >>>>> library, e.g. LaTeX, WebKit/other browser engines. > >>>>> > >>>>> -- > >>>>> Dr. Peter M. Kelly > >>>>> kellypmk@gmail.com > >>>>> http://www.kellypmk.net/ > >>>>> > >>>>> PGP key: http://www.kellypmk.net/pgp-key < > >>> http://www.kellypmk.net/pgp-key> > >>>>> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966) > >>>> > >>>> -- > >>>> Sent from My iPad, sorry for any misspellings. > >>> > > --=20 Sent from My iPad, sorry for any misspellings. --001a11c2618cf08044050e55665a--