incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Update of "PDFBoxProposal" by JukkaZitting
Date Tue, 29 Jan 2008 15:56:03 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The following page has been changed by JukkaZitting:
http://wiki.apache.org/incubator/PDFBoxProposal

The comment on the change is:
Added FontBox and JempBox

------------------------------------------------------------------------------
  
  === Proposal ===
  
- The PDFBox library allows creation of new PDF documents, manipulation of existing documents
and the ability to extract content from documents. PDFBox also includes several command line
utilities.
+ The PDFBox library allows creation of new PDF documents, manipulation of existing documents
and the ability to extract content from documents. PDFBox also includes several command line
utilities. Future development plans include extending PDFBox with advanced data extraction
and high level PDF creation functionality.
  
- There is demand for advanced data extraction and high level PDF creation functionality to
be developed and included in future PDFBox releases.
+ In addition to PDFBox, this proposal also covers the !FontBox and !JempBox companion libraries.
!FontBox is a Java font library used to obtain low level information from font files. !JempBox
is a Java library that implements Adobe's XMP specification. All these components would be
incubated as a single Apache PDFBox podling project.
  
  === Background ===
  
@@ -22, +22 @@

  
  Recently, Tika also expressed interest in advancing the content extraction capabilities
of PDFBox.
  
+ The !FontBox and !JempBox libraries have no dependencies to PDFBox, but their primary purpose
is to support PDFBox and the development community is largely overlapping. It makes sense
to include all three libraries in a single project.
+ 
  === Rationale ===
  
  The PDF document format is a common format found on internet and across industries as a
way of sharing documents.  Several Apache projects utilize PDF technologies but there is not
a single independent PDF library within the Apache organization.
  
  The Apache XML Graphics project (FOP/Batik) has a write-only PDF library and is in need
of PDF parsing functionality. Many features overlap those of PDFBox. This is currently a duplication
of effort, bringing PDFBox into Apache and combining our efforts will result in a more robust
PDF library that will be able to support many more use cases for working with PDF technologies.
  
+ !FontBox, FOP and Batik all contain font loading/handling code that could likely be merged
into a single common library either within the PDFBox podling or outside it.
  
  === Initial Goals ===
  
@@ -42, +45 @@

    * PDF/A creation and validation functionality
    * Review licensing of both bundled and external dependencies
    * Manage export control notices for cryptographic features
+   * Figure out how to handle font handling code across !FontBox, FOP, and Batik
+   * Replace !JempBox with Adobe's XMP library
  
  == Current Status ==
  
@@ -59, +64 @@

  
  === Alignment ===
  
- The ability to search PDF documents is a basic requirement for any enterprise search solution.
 PDFBox provides the basic content that is needed for content indexing.  This functionality
aligns with the those of Lucene, Nutch, Tika and UIMA and all users of these projects will
benefit from continued development of PDFBox.  
+ The ability to search PDF documents is a basic requirement for any enterprise search solution.
 PDFBox provides the basic content that is needed for content indexing.  This functionality
aligns with the those of Lucene, Nutch, Tika and UIMA and all users of these projects will
benefit from continued development of PDFBox.
+ 
+ PDFBox shares similar font loading and handling needs as FOP and Batik, and the code in
the !FontBox companion library could well be merged with similar code in the other projects.
  
  == Known Risks ==
  
@@ -94, +101 @@

    * [http://lucene.apache.org/nutch/ Lucene Nutch] Nutch currently utilizes PDFBox to index
PDF documents.
    * [http://incubator.apache.org/tika/ Tika] Tika currently utilizes PDFBox for extracting
PDF content.
    * [http://incubator.apache.org/uima/ Apache UIMA] UIMA analyzes unstructured content and
would benefit from PDF content.
-   * [http://xmlgraphics.apache.org/fop/ Apache FOP] There's an experimental plug-in (currently
hosted outside of the project) for FOP that uses PDFBox to support embedding of existing PDFs
in XSL-FO documents for PDF output. Both Batik and FOP have code to parse fonts which PDFBox
needs to do, too.
+   * [http://xmlgraphics.apache.org/fop/ Apache FOP] and [http://xmlgraphics.apache.org/batik/
Apache Batik] There's an experimental plug-in (currently hosted outside of the project) for
FOP that uses PDFBox to support embedding of existing PDFs in XSL-FO documents for PDF output.
Both Batik and FOP have code to parse fonts which !FontBox needs to do, too.
  
  === A Excessive Fascination with the Apache Brand ===
  
@@ -103, +110 @@

  == Documentation ==
  
    * PDFBox ([http://www.pdfbox.org/])
+   * !FontBox ([http://www.fontbox.org/])
+   * !JempBox ([http://www.jempbox.org/])
  
  == Initial Source ==
  
- Initial source will come from the existing SourceForge repository.
+ Initial source will come from the existing SourceForge repositories of the PDFBox, !FontBox,
and !JempBox projects.
  
  == Source and Intellectual Property Submission Plan ==
  
@@ -119, +128 @@

  ||'''Library'''||'''License'''||'''Description'''||
  ||Adobe AFM||Adobe AFM License||Resources for extracting font encoding. Bundled inside PDFBox
jar file.||
  ||Bouncycastle||BSD Variant||Support for encrypting/decrypting PDF documents.||
- ||FontBox||BSD||Sub project of PDFBox to support font functionality||
  ||IKVM||BSD Variant [1]||Support of PDFBox on .NET platform||
  ||junit||CPL||Unit Testing Framework||
- ||JempBox||BSD||Sub project of PDFBox to support XMP functionality||
  ||Lucene||ASL||Provide classes for easy Lucene integration||
  ||JAI-CMM||Sun JAI||Provides support from color spaces||
  

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message