jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Reutegger (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (JCR-2388) Upgrade PDFBox to version 0.8.0
Date Mon, 09 Nov 2009 08:26:32 GMT

     [ https://issues.apache.org/jira/browse/JCR-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Marcel Reutegger resolved JCR-2388.
-----------------------------------

       Resolution: Invalid
    Fix Version/s:     (was: 2.0-beta2)

As of Jackrabbit 2.0 the module jackrabbit-text-extractors has been replaced with a dependency
to Apache Tika 0.4, which includes PDFBox 0.7.3

If you are using Jackrabbit 1.x then I suggest you write your own text extractor that uses
PDFBox 0.8.0 and configure it accordingly in the workspace.xml.

For Jackrabbit 2.0 we'd have to wait for Tika 0.5, which will include PDFBox 0.8.0 (http://issues.apache.org/jira/browse/TIKA-158)

> Upgrade PDFBox to version 0.8.0
> -------------------------------
>
>                 Key: JCR-2388
>                 URL: https://issues.apache.org/jira/browse/JCR-2388
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-text-extractors
>    Affects Versions: 2.0-beta1
>            Reporter: William Woodward
>
> The most recent version of PDFBox fixes a bug in their PDFParser class that caused a
null pointer when attempting to extract text from documents created w/ Acrobat Pro version
9. see: https://issues.apache.org/jira/browse/PDFBOX-361. Since this is the first Apache incubator
release they have also changed the package names. Therefore, simply getting the new PDFBox
in not an option because the Jackrabbit text extractor references the old package names.
> This is a MAJOR problem for us since our user community recently updated to Acrobat 9
(and we have no control over this decision). Our users produce time sensitive reports. Without
an updated Jackrabbit (w/ updated PDFBox) we can no longer extract and index text from the
user's PDFs.
> Thank you for your consideration in this matter,
> Bill Woodward
> Developer

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message