lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: Disabling Zip bomb detection in Tika
Date Thu, 22 Sep 2016 14:48:25 GMT
Y, looks like Nick (gagravarr) has answered on SO -- can't do it in Tika currently.

-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org] 
Sent: Thursday, September 22, 2016 10:42 AM
To: solr-user@lucene.apache.org
Cc: 'user@tika.apache.org' <user@tika.apache.org>
Subject: RE: Disabling Zip bomb detection in Tika

I don't think that's configurable at the moment.  

Tika-colleagues, any recommendations?

If you're able to share the file on Tika's jira, we'd be happy to take a look.  You shouldn't
be getting the zip bomb unless there is a mismatch between opening and closing tags (which
could point to a bug in Tika).

-----Original Message-----
From: Rodrigo Rosenfeld Rosas [mailto:rr_rosas@yahoo.com.br.INVALID] 
Sent: Thursday, September 22, 2016 10:06 AM
To: solr-user@lucene.apache.org
Subject: Disabling Zip bomb detection in Tika

Hi, this is my first message in this list.

Is it possible to disable Zip bomb detection in the Tika handler?

I've also described the problem here:

http://stackoverflow.com/questions/39628519/how-to-disable-or-increase-limit-zip-bomb-detection-in-tika-with-solr-config?noredirect=1#comment66575342_39628519

Basically, I get this error when trying to process some big valid HTML
documents:

RSolr::Error::Http - 500 Internal Server Error
Error: 
{'responseHeader'=>{'status'=>500,'QTime'=>76},'error'=>{'metadata'=>['error-class','org.apache.solr.common.SolrException','root-error-class','org.apache.tika.sax.SecureContentHandler$SecureSAXException'],'msg'=>'org.apache.tika.exception.TikaException:

Zip bomb detected!','trace'=>'org.apache.solr.common.SolrException: 
org.apache.tika.exception.TikaException: Zip bomb detected!
         at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)
         at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
         at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154)
         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089)
         at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)
         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)
         at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
         at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)

I need to index those documents. Is it possible to disable Zip bomb detection or to increase
the limit using configuration files? I noticed it's possible to add a tika.config file but
I have no idea on how to specify what I want in such Tika configuration files.

Any help is appreciated!

Thanks in advance,
Rodrigo.
Mime
View raw message