lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jayendra Patil (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-2416) Solr Cell fails to index Zip file contents
Date Fri, 18 Mar 2011 02:13:29 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008284#comment-13008284
] 

Jayendra Patil commented on SOLR-2416:
--------------------------------------

This issue existed in Solr 1.4 packaged with Tika 0.4, which prevented us from using the stable
version.

Thread - http://lucene.472066.n3.nabble.com/Issue-Indexing-zip-file-content-in-Solr-1-4-td504914.html
The issue was resolved with the Tika 0.5 upgrade @ https://issues.apache.org/jira/browse/SOLR-1567

We are working on a Snapshot of Solr Trunk 4.X marked around 15 August 2010, which uses the
Tika 0.8 snapshot jars, and the extraction works fine for us.
However, with the latest Trunk upgraded to the stable release of Tika 0.8, it does not have
the same behaviour.

> Solr Cell fails to index Zip file contents
> ------------------------------------------
>
>                 Key: SOLR-2416
>                 URL: https://issues.apache.org/jira/browse/SOLR-2416
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler, contrib - Solr Cell (Tika extraction)
>    Affects Versions: 1.4.1
>            Reporter: Jayendra Patil
>             Fix For: 3.2
>
>         Attachments: SOLR-2416_ExtractingDocumentLoader.patch
>
>
> Working with the latest Solr Trunk code and seems the Tika handlers for Solr Cell (ExtractingDocumentLoader.java)
and Data Import handler (TikaEntityProcessor.java) fails to index the zip file contents again.
> It just indexes the file names again.
> This issue was addressed some time back, late last year, but seems to have reappeared
with the latest code.
> Jira for the Data Import handler part with the patch and the testcase - https://issues.apache.org/jira/browse/SOLR-2332.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message