jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (JCR-1567) Upgrade to PDFBox 0.7.3
Date Fri, 30 May 2008 15:14:45 GMT

     [ https://issues.apache.org/jira/browse/JCR-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jukka Zitting resolved JCR-1567.
--------------------------------

    Resolution: Fixed

Upgraded the dependency in revision 661756.

I explicitly excluded the transitive Bouncy Castle dependencies that would have forced us
to deal with the crypto export stuff (http://www.apache.org/dev/crypto.html).

> Upgrade to PDFBox 0.7.3
> -----------------------
>
>                 Key: JCR-1567
>                 URL: https://issues.apache.org/jira/browse/JCR-1567
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: indexing, jackrabbit-text-extractors
>    Affects Versions: 1.4
>         Environment: Tomcat 6; JDK 1.6; Windows 2003;
>            Reporter: Julio Castillo
>            Assignee: Jukka Zitting
>             Fix For: 1.5
>
>
> while trying to upload a PDF document (which I can view fine with Acrobat Reader once
it is loaded) I get the following exception: 
> 01.05.2008 12:24:44 *WARN * PdfTextExtractor: Failed to extract PDF text content (PdfTextExtractor.java,
line 91)
> java.io.IOException: Error: Expected an integer type, actual='%%EOF'
>         at org.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1159)
>         at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:349)
>         at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:132)
>         at org.apache.jackrabbit.extractor.PdfTextExtractor.extractText(PdfTextExtractor.java:69)
>         at org.apache.jackrabbit.extractor.CompositeTextExtractor.extractText(CompositeTextExtractor.java:90)
>         at org.apache.jackrabbit.core.query.lucene.JackrabbitTextExtractor.extractText(JackrabbitTextExtractor.java:195)
>         at org.apache.jackrabbit.core.query.lucene.NodeIndexer.addBinaryValue(NodeIndexer.java:393)
>  ....
> I replaced the version of pdfbox (0.6.4) that is bundled with the jackrabbit war file
with a more recent version (0.7.3 and fontbox 01.) and it worked fine. The bundled versions
should be upgraded.
> On the other hand, this software appears to be inactive. Probably a different package
should be selected in the long run, but for now, a simple upgrade will do the trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message