jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Created: (JCR-2502) Upgrade to Tika 0.6 and PDFBox 1.0.0
Date Thu, 18 Feb 2010 10:37:27 GMT
Upgrade to Tika 0.6 and PDFBox 1.0.0

                 Key: JCR-2502
                 URL: https://issues.apache.org/jira/browse/JCR-2502
             Project: Jackrabbit Content Repository
          Issue Type: Improvement
          Components: jackrabbit-core, jackrabbit-jcr-server
            Reporter: Jukka Zitting
            Assignee: Jukka Zitting
            Priority: Minor

Tika version 0.6 uses POI 3.6 that's notably smaller (-10MB!) than previous versions. There
are also a number of other improvements in Tika 0.6 since the 0.5 release.

While doing the upgrade we should also force the PDFBox version to 1.0.0 from the 0.8.0-incubating
version that Tika 0.6 uses. PDFBox 1.0.0 has some nice performance gains (around 30% faster)
to text extraction along with other improvements.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message