lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject 7.3 appears to leak
Date Thu, 26 Apr 2018 16:43:50 GMT
Hello,

We just finished upgrading our three separate clusters from 7.2.1 to 7.3, which went fine,
except for our main text search collection, it appears to leak memory on commit!

After initial upgrade we saw the cluster slowly starting to run out of memory within about
an hour and a half. We increased heap in case 7.3 just requires more of it, but the heap consumption
graph is still growing on each commit. Heap space cannot be reclaimed by forcing the garbage
collector to run, everything just piles up in the OldGen. Running with this slightly larger
heap, the first nodes will run out of memory in about two and a half hours after cluster restart.

The heap eating cluster is a 2shard/3replica system on separate nodes. Each replica is about
50 GB in size and about 8.5 million documents. On 7.2.1 it ran fine with just a 2 GB heap.
With 7.3 and 2.5 GB heap, it will take just a little longer for it to run out of memory.

I inspected reports shown by the sampler of VisualVM and spotted one peculiarity, the number
of instances of SortedIntDocSet kept growing on each commit by about the same amount as the
number of cached filter queries. But this doesn't happen on the logs cluster, SortedIntDocSet
instances are neatly collected there. The number of instances also accounts for the number
of commits since start up times the cache sizes

Our other two clusters don't have this problem, one of them receives very few commits per
day, but the other receives data all the time, it logs user interactions so a large amount
of data is coming in all the time. I cannot reproduce it locally by indexing data and committing
all the time, the peak usage in OldGen stays about the same. But, i can reproduce it locally
when i introduce queries, and filter queries while indexing pieces of data and committing
it.

So, what is the problem? I dug in the CHANGES.txt of both Lucene and Solr, but nothing really
caught my attention. Does anyone here have an idea where to look?

Many thanks,
Markus

Mime
View raw message