lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Solr Heap Usage
Date Sun, 02 Jun 2019 14:44:30 GMT
Oh, there are about a zillion reasons ;).

First of all, most tools that show heap usage also count uncollected garbage. So your 10G
could actually be much less “live” data. Quick way to test is to attach jconsole to the
running Solr and hit the button that forces a full GC.

Another way is to reduce your heap when you start Solr (on a test system of course) until
bad stuff happens, if you reduce it to very close to what Solr needs, you’ll get slower
as more and more cycles are spent on GC, if you reduce it a little more you’ll get OOMs.

You can take heap dumps of course to see where all the memory is being used, but that’s
tricky as it also includes garbage.

I’ve seen cache sizes (filterCache in particular) be something that uses lots of memory,
but that requires queries to be fired. Each filterCache entry can take up to roughly maxDoc/8
bytes + overhead….

A classic error is to sort, group or facet on a docValues=false field. Starting with Solr
7.6, you can add an option to fields to throw an error if you do this, see:

In short, there’s not enough information until you dive in and test bunches of stuff to


> On Jun 2, 2019, at 2:22 AM, John Davis <> wrote:
> This makes sense, any ideas why lucene/solr will use 10g heap for a 20g
> index.My hypothesis was merging segments was trying to read it all but if
> that's not the case I am out of ideas. The one caveat is we are trying to
> add the documents quickly (~1g an hour) but if lucene does write 100m
> segments and does streaming merge it shouldn't matter?
> On Sat, Jun 1, 2019 at 9:24 AM Walter Underwood <>
> wrote:
>>> On May 31, 2019, at 11:27 PM, John Davis <>
>> wrote:
>>> 2. Merging segments - does solr load the entire segment in memory or
>> chunks
>>> of it? if later how large are these chunks
>> No, it does not read the entire segment into memory.
>> A fundamental part of the Lucene design is streaming posting lists into
>> memory and processing them sequentially. The same amount of memory is
>> needed for small or large segments. Each posting list is in document-id
>> order. The merge is a merge of sorted lists, writing a new posting list in
>> document-id order.
>> wunder
>> Walter Underwood
>>  (my blog)

View raw message