lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Solr Caching - how to tune, how much to increase, and any tips on using Solr with JDK7 and G1 GC?
Date Sat, 29 Sep 2012 20:58:00 GMT
Well, I haven't had experience with JDK7, so I'll skip that part...

But about caches. First, as far as memory is concerned, be
sure to read Uwe's blog about MMapDirectory here:

As to the caches.

Be a little careful here. Getting high hit rates on _all_ your caches
is a waste.

filterCache. This is the exception, you want as high a hit ratio as you can
get for this one, it's where the results of all the &fq= clauses go and is a
major factor in speeding up QPS..

queryResultCache. Hmmm, given the lack of updates to your index, this one
may actually get more hits than Id expect. But it's a very cheap cache memory
wise. Think of it as a map where the key is the query and the value is an
array of <queryResultWindowSize> longs (document IDs). It's really intended
for paging mostly. It's also often the case that the chances of the exact
same query (except for &start and &rows) being issued is actually relatively
small. As always YMMV. I usually see hit rates on this cache < 10%. Evictions
merely mean it's been around a long time, bumping the size of this cache
probably won't affect the hit rate unless your app somehow submits just
a few queries.

documentCache. Again, this often doesn't have a great hit ration. It's main
use as I understand it is to keep various parts of a query component chain
from having to re-access the disk. Each element in a query component is
completely separate from the others, so if two or more components want
values from the doc, having them cached is useful. The usual recommendation
is (#docs returned to user) * (expected simultaneous queries), where
"# docs returned to user" is really the &rows value.

One of the consequences of having huge amounts of memory allocated to
the JVM can be really long garbage collections. They happen less frequently
but have more work to do when they happen.

Oh, and when you start using 4.0, the memory patterns are much different...

Finally, here's a great post on solr memory tuning, too bad the image links
are broken...


On Sat, Sep 29, 2012 at 3:08 PM, Aaron Daubman <> wrote:
> Greetings,
> I've recently moved to running some of our Solr (3.6.1) instances
> using JDK 7u7 with the G1 GC (playing with max pauses in the 20 to
> 100ms range). By and large, it has been working well (or, perhaps I
> should say that without requiring much tuning it works much better in
> general than my haphazard attempts to tune CMS).
> I have two instances in particular, one with a heap size of 14G and
> one with a heap size of 60G. I'm attempting to squeeze out additional
> performance by increasing Solr's cache sizes (I am still seeing the
> hit ratio go up as I increase max size size and decrease the number of
> evictions), and am guessing this is the cause of some recent
> situations where the 14G instance especially eventually (12-24 hrs
> later under 100s of queries per minute) makes it to 80%-90% of the
> heap and then spirals into major GC with long-pause territory.
> I am wondering:
> 1) if anybody has experience tuning the G1 GC, especially for use with
> Solr (what are decent max-pause times to use?)
> 2) how to better tune Solr's cache sizes - e.g. how to even tell the
> actual amount of memory used by each cache (not # entries as the stats
> sow, but # bits)
> 3) if there are any guidelines on when increasing a cache's size (even
> if it does continue to increase the hit ratio) runs into the law of
> diminishing returns or even starts to hurt - e.g. if the document
> cache has a current maxSize of 65536 and has seen 4409275 evictions,
> and currently has a hit ratio of 0.74, should the max be increased
> further? If so, how much ram needs to be added to the heap, and how
> much larger should its max size be made?
> I should mention that these solr instances are read-only (so cache is
> probably more valuable than in other scenarios - we only invalidate
> the searcher every 20-24hrs or so) and are also backed with indexes
> (6G and 70G for the 14G and 60G heap sizes) on IODrives, so I'm not as
> concerned about leaving RAM for linux to cache the index files (I'd
> much rather actually cache the post-transformed values).
> Thanks as always,
>      Aaron

View raw message