lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Norskog (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2332) TikaEntityProcessor retrieves only File Names from Zip extraction
Date Fri, 17 Feb 2012 03:38:05 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210033#comment-13210033
] 

Lance Norskog commented on SOLR-2332:
-------------------------------------

Unpacking a zip file is a very narrow, focused operation. This could also be done with a separate
UpdateRequestHandler that does nothing but unpack zip files. It would use the basic JDK zip
file code, not Tika. You configure the Tika handler beneath it. 

Another use case is a ZIP file full of solr update xml files, which TIKA does not know about.
To do this, you want an UpdateRequestHandler stack like this: zip unpacker -> XmlUpdateRequestHandler

                
> TikaEntityProcessor retrieves only File Names from Zip extraction
> -----------------------------------------------------------------
>
>                 Key: SOLR-2332
>                 URL: https://issues.apache.org/jira/browse/SOLR-2332
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>            Reporter: Jayendra Patil
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2332.patch, solr-word.zip
>
>
> Extraction of Zip files using TikaEntityProcessor results in only names of file.
> It does not extract the contents of the Files in the Zip

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message