lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Surendranadh Puranam (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SOLR-2550) Apache Solr needs an updated TIKA version in its extraction libraries
Date Fri, 27 May 2011 12:22:47 GMT
Apache Solr needs an updated TIKA version in its extraction libraries
---------------------------------------------------------------------

                 Key: SOLR-2550
                 URL: https://issues.apache.org/jira/browse/SOLR-2550
             Project: Solr
          Issue Type: Bug
          Components: contrib - Solr Cell (Tika extraction)
    Affects Versions: 1.4.1
            Reporter: Surendranadh Puranam
            Priority: Critical
             Fix For: 1.4.2


There are issues with some PDF documents when it gets indexed (extracted?). There is an issue
being fixed by PDFBOX in the version PDFBox 1.1.0. But Apache solr 1.4.1 doesn't have the
latest version of these jars which is causing these failures. We have tika-pareser0.4 in this
solr 1.4.1 distribution which has to be updated to 0.9 version.

Reference for the issue and the solution : https://issues.apache.org/jira/browse/PDFBOX-617



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message