lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Memory Leak in 7.3 to 7.4
Date Sun, 05 Aug 2018 17:22:17 GMT
On 8/2/2018 5:30 AM, Thomas Scheffler wrote:
> my final verdict is the upgrade to Tika 1.17. If I downgrade the libraries just for tika
back to 1.16 and keep the rest of SOLR 7.4.0 the heap usage after about 85 % of the index
process and manual trigger of the garbage collector is about 60-70 MB (That low!!!)
>
> My problem now is that we have several setups that triggers this reliably but there is
no simple test case that „fails“ if Tika 1.17 or 1.18 is used. I also do not know if the
error is inside Tika or inside the glue code that makes Tika usable in SOLR.

If downgrading Tika fixes the issue, then it doesn't seem (to me) very 
likely that Solr's glue code for ERH has a problem. If it's not Solr's 
code that has the problem, there will be nothing we can do about it 
other than change the Tika library included with Solr.

Before filing an issue, you should discuss this with the Tika project on 
their mailing list.  They'll want to make sure that they can fix the 
problem in a future version.  It might not be an actual memory leak ... 
it could just be that one of the documents you're trying to index is one 
that Tika requires a huge amount of memory to handle.  But it could be a 
memory leak.

If you know which document is being worked on when it runs out of 
memory, can you try not including that document in your indexing, to see 
if it still has a problem?

Please note that it is strongly recommended that you do not use the 
Extracting Request Handler in production.  Tika is prone to many 
problems, and those problems will generally affect Solr if Tika is being 
run inside Solr.  Because of this, it is recommended that you write a 
separate program using Tika that handles extracting information from 
documents and sending that data to Solr.  If that program crashes, Solr 
remains operational.

There is already an issue to upgrade Tika to the latest version in Solr, 
but you've said that you tried 1.18 already with no change to the 
problem.  So whatever the problem is, it will need to be solved in 1.19 
or later.

Thanks,
Shawn


Mime
View raw message