lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rowe (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-1786) Solr (trunk rev. 912116) suffers from PDFBOX-537 [Endless loop in org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary()] fixed in PDFbox 1.0?
Date Wed, 26 Oct 2011 00:20:32 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135582#comment-13135582
] 

Steven Rowe commented on SOLR-1786:
-----------------------------------

Solr Cell upgraded to Tika 0.8, which included PDFbox 1.1.0, in the Solr 3.1 release.

The Solr 3.5 release will include Tika 0.10, which includes PDFbox 1.6.0.

Likely this problem has been addressed.

Jan, can you test Solr 3.1+ to confirm?
                
> Solr (trunk rev. 912116) suffers from PDFBOX-537 [Endless loop in org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary()]
 fixed in PDFbox 1.0?
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1786
>                 URL: https://issues.apache.org/jira/browse/SOLR-1786
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 1.5
>         Environment: Ubuntu 9.10, 32bit
>            Reporter: Jan Iwaszkiewicz
>            Priority: Critical
>              Labels: PDFbox
>             Fix For: 3.5, 4.0
>
>
> I tried indexing several thousand PDF documents but could not finish as Solr was falling
into an endless loop for some of them, for instance: http://cdsweb.cern.ch/record/702585/files/sl-note-2000-019.pdf
(the PDF seems OK).
> Can Solr start using PDFbox 1.0?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message