lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] Updated: (SOLR-2416) Solr Cell fails to index Zip file contents
Date Fri, 18 Mar 2011 01:28:29 GMT

     [ https://issues.apache.org/jira/browse/SOLR-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man updated SOLR-2416:
---------------------------

    Affects Version/s:     (was: 4.0)
                       1.4.1
        Fix Version/s: 3.2
              Summary: Solr Cell fails to index Zip file contents  (was: Solr Cell & DataImport
Tika handler broken - fails to index Zip file contents)

I'm not sure what exactly jayendra is referring to by "was addressed some time back ... seems
to have reappeared" (i couldn't find any issues that looked similar) but i just tested and
confirmed that in 1.4.1 SolrCell only indexed the metadata about *.zip files, not the contents
of the zip.

the behavior in the 3.1rc1 solr release candidate is consistent with 1.4.1 - only info about
the zip file itself is extracted, not the contents (although in 3.1 we actually extract more
metadata then we did in 1.4.1) so this definitely isn't a 3.1 blocker (some people were wondering
on IRC)

I'm not personally even clear if this is really a bug, or if it should be request option driven
-- perhaps some users only want the data about the zip file, not it's contents; and what should
the beahvior be if zip file contains multiple files, and the request specifies a literal id?

> Solr Cell fails to index Zip file contents
> ------------------------------------------
>
>                 Key: SOLR-2416
>                 URL: https://issues.apache.org/jira/browse/SOLR-2416
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler, contrib - Solr Cell (Tika extraction)
>    Affects Versions: 1.4.1
>            Reporter: Jayendra Patil
>             Fix For: 3.2
>
>         Attachments: SOLR-2416_ExtractingDocumentLoader.patch
>
>
> Working with the latest Solr Trunk code and seems the Tika handlers for Solr Cell (ExtractingDocumentLoader.java)
and Data Import handler (TikaEntityProcessor.java) fails to index the zip file contents again.
> It just indexes the file names again.
> This issue was addressed some time back, late last year, but seems to have reappeared
with the latest code.
> Jira for the Data Import handler part with the patch and the testcase - https://issues.apache.org/jira/browse/SOLR-2332.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message