lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <>
Subject RE: Zip Bomb Exception in HTML File
Date Wed, 04 Jan 2017 19:23:53 GMT
This came up back in September [1] and [2].  Same trigger...crazy number of divs.  

I think we could modify the AutoDetectParser to enable configuration of maximum zip-bomb depth
via tika-config.

If there's any interest in this, re-open TIKA-2091, and I'll take a look.




-----Original Message-----
From: Erick Erickson [] 
Sent: Wednesday, January 4, 2017 12:20 PM
To: solr-user <>
Subject: Re: Zip Bomb Exception in HTML File

You might get a more knowledgeable response from the Tika folks, that's really not something
Solr controls.


On Wed, Jan 4, 2017 at 8:50 AM,  <> wrote:
> i get an exception "<strname="msg">org.apache.tika.exception.TikaException:
> Zip bomb detected!</str"
> if i would like to parse a html file - and i think i know why.
> because there are many many <div><span> in cascade over 200 divs and 
> span are inside each.
> Is it correct that there is this limit for html files?
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.

View raw message