incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Update of "PDFBoxProposal" by JeremiasMaerki
Date Thu, 15 Nov 2007 06:44:23 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The following page has been changed by JeremiasMaerki:
http://wiki.apache.org/incubator/PDFBoxProposal

The comment on the change is:
Some input from my side. Thanks for starting the proposal.

------------------------------------------------------------------------------
  
  The PDF document format is a common format found on internet and across industries as a
way of sharing documents.  Several Apache projects utilize PDF technologies but there is not
a single independent PDF library within the Apache organization.  
  
- The Apache FOP project has many features that overlap those of PDFBox and is currently a
duplication of effort, bringing PDFBox into Apache and combining our efforts will result in
a more robust PDF library that will be able to support many more use cases for working with
PDF technologies.
+ The Apache XML Graphics project (FOP/Batik) has a write-only PDF library and is in need
of PDF parsing functionality. Many features overlap those of PDFBox. This is currently a duplication
of effort, bringing PDFBox into Apache and combining our efforts will result in a more robust
PDF library that will be able to support many more use cases for working with PDF technologies.
  
  
  === Initial Goals ===
@@ -35, +35 @@

  
    * Advanced text extraction techniques
    * Increase community involvement
-   * Cooperation with existing Apache projects such as FOP
+   * Cooperation with existing Apache projects such as XML Graphics
    * Increasing support for PDF document features
    * Adding a high level API for document creation
    * Adding a streaming API for document creation
+   * PDF/A creation and validation functionality
  
  == Current Status ==
  
@@ -84, +85 @@

    * [http://lucene.apache.org/nutch/ Lucene Nutch] Nutch currently utilizes PDFBox to index
PDF documents.
    * [http://incubator.apache.org/tika/ Tika] Tika currently utilizes PDFBox for extracting
PDF content.
    * [http://incubator.apache.org/uima/ Apache UIMA] UIMA analyzes unstructured content and
would benefit from PDF content.
+   * [http://xmlgraphics.apache.org/fop/ Apache FOP] There's an experimental plug-in (currently
hosted outside of the project) for FOP that uses PDFBox to support embedding of existing PDFs
in XSL-FO documents for PDF output.
  
  === A Excessive Fascination with the Apache Brand ===
  
@@ -153, +155 @@

  || '''Name'''        || '''Affiliation'''                         ||
  ||Ben Litchfield   || Independent ||
  ||Jukka Zitting || Day Software ||
- ||Jeremias Maerki || - ||
+ ||Jeremias Maerki || Independent ||
  
  == Sponsors ==
  

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message