lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr caching clarifications
Date Fri, 12 Jul 2013 13:20:08 GMT
Inline

On Thu, Jul 11, 2013 at 8:36 AM, Manuel Le Normand
<manuel.lenormand@gmail.com> wrote:
> Hello,
> As a result of frequent java OOM exceptions, I try to investigate more into
> the solr jvm memory heap usage.
> Please correct me if I am mistaking, this is my understanding of usages for
> the heap (per replica on a solr instance):
> 1. Buffers for indexing - bounded by ramBufferSize
> 2. Solr caches
> 3. Segment merge
> 4. Miscellaneous- buffers for Tlogs, servlet overhead etc.
>
> Particularly I'm concerned by Solr caches and segment merges.
> 1. How much memory consuming (bytes per doc) are FilterCaches (bitDocSet)
> and queryResultCaches (DocList)? I understand it is related to the skip
> spaces between doc id's that match (so it's not saved as a bitmap). But
> basically, is every id saved as a java int?

Different beasts. filterCache consumes, essentially, maxDoc/8 bytes (you
can get the maxDoc number from your Solr admin page). Plus some overhead
for storing the fq text, but that's usually not much. This is for each
entry up to "Size".

queryResultCache is usually trivial unless you've configured it extravagantly.
It's the query string length + queryResultWindowSize integers per entry
(queryResultWindowSize is from solrconfig.xml).

> 2. QueryResultMaxDocsCached - (for example = 100) means that any query
> resulting in more than 100 docs will not be cached (at all) in the
> queryResultCache? Or does it have to do with the documentCache?
It's just a limit on the queryResultCache entry size as far as I can
tell. But again
this cache is relatively small, I'd be surprised if it used
significant resources.

> 3. DocumentCache - written on the wiki it should be greater than
> max_results*concurrent_queries. Max result is just the num of rows
> displayed (rows-start) param, right? Not the queryResultWindow.

Yes. This a cache (I think) for the _contents_ of the documents you'll
be returning to be manipulated by various components during the life
of the query.

> 4. LazyFieldLoading=true - when quering for id's only (fl=id) will this
> cache be used? (on the expense of eviction of docs that were already loaded
> with stored fields)

Not sure, but I don't think this will contribute much to memory pressure. This
is about now many fields are loaded to get a single value from a doc in the
results list, and since one is usually working with 20 or so docs this
is usually
a small amount of memory.

> 5. How large is the heap used by mergings? Assuming we have a merge of 10
> segments of 500MB each (half inverted files - *.pos *.doc etc, half non
> inverted files - *.fdt, *.tvd), how much heap should be left unused for
> this merge?

Again, I don't think this is much of a memory consumer, although I
confess I don't
know the internals. Merging is mostly about I/O.

>
> Thanks in advance,
> Manu

But take a look at the admin page, you can see how much memory various
caches are using by looking at the plugins/stats section.

Best
Erick

Mime
View raw message