lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] Updated: (SOLR-2332) TikaEntityProcessor retrieves only File Names from Zip extraction
Date Fri, 18 Mar 2011 01:36:29 GMT

     [ https://issues.apache.org/jira/browse/SOLR-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man updated SOLR-2332:
---------------------------

    Affects Version/s:     (was: 4.0)
        Fix Version/s: 3.2

I can't find any docs suggestion how exactly TikaEntityProcessor should be expected to deal
with zip files, particularly what to expect if a zip files contains multiple documents.

FWIW: TikaEntityProcessor did not exist in Solr 1.4.1, so the behavior currently seen in the
3x branch (and the 3.1rc1 artifacts) is not a regression.

> TikaEntityProcessor retrieves only File Names from Zip extraction
> -----------------------------------------------------------------
>
>                 Key: SOLR-2332
>                 URL: https://issues.apache.org/jira/browse/SOLR-2332
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>            Reporter: Jayendra Patil
>             Fix For: 3.2
>
>         Attachments: SOLR-2332.patch, solr-word.zip
>
>
> Extraction of Zip files using TikaEntityProcessor results in only names of file.
> It does not extract the contents of the Files in the Zip

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message