lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen <t...@kb.dk>
Subject Re: Solr Index size keeps fluctuating, becomes ~4x normal size.
Date Thu, 06 Apr 2017 12:42:11 GMT
On Thu, 2017-04-06 at 16:30 +0530, Himanshu Sachdeva wrote:
> We monitored the index size for a few days and found that it varies
> widely from 11GB to 43GB. 

Lucene/Solr indexes consists of segments, each holding a number of
documents. When a document is deleted, its bytes are not removed
immediately, only marked. When a document is updated, it is effectively
a delete and an add.

If you have an index with 3 documents
  segment-0 (live docs [0, 1, 2], deleted docs [])
and update document 0 and 1, you will have
  segment-0 (live docs [2], deleted docs [0, 1])
  segment-1 (live docs
[0, 1], deleted docs [])
if you then update document 1 again, you will
have
  segment-0 (live docs [2], deleted docs [0, 1])
  segment-1 (live
docs [0], deleted docs [1])
  segment-1 (live docs [1], deleted docs [])

for a total of ([2] + [0, 1]) + ([0] + [1]) + ([1] + []) = 6 documents.

The space is reclaimed when segments are merged, but depending on your setup and update pattern
that may take some time. Furthermore there is a temporary overhead of merging, when the merged
segment is being written and the old segments are still available. 4x the minimum size is
fairly large, but not unrealistic, with enough index-updates.

> Recently, we started getting a lot of out of memory errors on the
> master. Everytime, solr becomes unresponsive and we need to restart
> jetty to bring it back up. At the same we observed the variation in
> index size. We are suspecting that these two problems may be linked.

Quick sanity check: Look for "Overlapping onDeckSearchers" in your
solr.log to see if your memory problems are caused by multiple open
searchers:
https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarm
ingSearchers.3DX.22_mean.3F
-- 
Toke Eskildsen, Royal Danish Library
Mime
View raw message