lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SOLR-1786) Solr (trunk rev. 912116) suffers from PDFBOX-537 [Endless loop in org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary()] fixed in PDFbox 1.0?
Date Thu, 07 Jun 2012 18:13:23 GMT

     [ https://issues.apache.org/jira/browse/SOLR-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man resolved SOLR-1786.
----------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 4.0)
                   3.1

the initial problem report was specifically about an endless loop that could be avoided by
rolling back tika -- but that endless loop was already fixed by upgrading Tika in Solr 3.1
-- which causes a hard error instead.

(There are always going to be some files that can't be parsed, and since we're delegating
to PDFBox (via Tika) it's not really something we can worry too much about).


                
> Solr (trunk rev. 912116) suffers from PDFBOX-537 [Endless loop in org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary()]
 fixed in PDFbox 1.0?
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1786
>                 URL: https://issues.apache.org/jira/browse/SOLR-1786
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 1.5
>         Environment: Ubuntu 9.10, 32bit
>            Reporter: Jan Iwaszkiewicz
>            Priority: Critical
>              Labels: PDFbox
>             Fix For: 3.1
>
>
> I tried indexing several thousand PDF documents but could not finish as Solr was falling
into an endless loop for some of them, for instance: http://cdsweb.cern.ch/record/702585/files/sl-note-2000-019.pdf
(the PDF seems OK).
> Can Solr start using PDFbox 1.0?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message