lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject RE: Solr using a ridiculous amount of memory
Date Wed, 17 Apr 2013 13:06:38 GMT
John Nielsen [jn@mcb.dk]:
> I never seriously looked at my fieldValueCache. It never seemed to get used:

> http://screencast.com/t/YtKw7UQfU

That was strange. As you are using a multi-valued field with the new setup, they should appear
there. Can you find the facet fields in any of the other caches?

...I hope you are not calling the facets with facet.method=enum? Could you paste a typical
facet-enabled search request?

> Yep. We still do a lot of sorting on dynamic field names, so the field cache
> has a lot of entries. (9.411 entries as we speak. This is considerably lower
> than before.). You mentioned in an earlier mail that faceting on a field
> shared between all facet queries would bring down the memory needed.
> Does the same thing go for sorting?

More or less. Sorting stores the raw string representations (utf-8) in memory so the number
of unique values has more to say than it does for faceting. Just as with faceting, a list
of pointers from documents to values (1 value/document as we are sorting) is maintained, so
the overhead is something like

#documents*log2(#unique_terms*average_term_length) + #unique_terms*average_term_length
(where average_term_length is in bits)

Caveat: This is with the index-wide sorting structure. I am fairly confident that this is
what Solr uses, but I have not looked at it lately so it is possible that some memory-saving
segment-based trickery has been implemented.

> Does those 9411 entries duplicate data between them?

Sorry, I do not know. SOLR-1111 discusses the problems with the field cache and duplication
of data, but I cannot infer if it is has been solved or not. I am not familiar with the stat
breakdown of the fieldCache, but it _seems_ to me that there are 2 or 3 entries for each segment
for each sort field. Guesstimating further, let's say you have 30 segments in your index.
Going with the guesswork, that would bring the number of sort fields to 9411/3/30 ~= 100.
Looks like you use a custom sort field for each client?

Extrapolating from 1.4M documents and 180 clients, let's say that there are 1.4M/180/5 unique
terms for each sort-field and that their average length is 10. We thus have
1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB 
per sort field or about 4GB for all the 180 fields.

With this few unique values, the doc->value structure is by far the biggest, just as with
facets. As opposed to the faceting structure, this is fairly close to the actual memory usage.
Switching to a single sort field would reduce the memory usage from 4GB to about 55MB.

> I do commit a bit more often than i should. I get these in my log file from
> time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

So 1 active searcher and 2 warming searchers. Ignoring that one of the warming searchers is
highly likely to finish well ahead of the other one, that means that your heap must hold 3
times the structures for a single searcher. With the old heap size of 25GB that left "only"
8GB for a full dataset. Subtract the 4GB for sorting and a similar amount for faceting and
you have your OOM.

Tweaking your ingest to avoid 3 overlapping searchers will lower your memory requirements
by 1/3. Fixing the facet & sorting logic will bring it down to laptop size.

> The control panel says that the warm up time of the last searcher is 5574. Is that seconds
or milliseconds?
> http://screencast.com/t/d9oIbGLCFQwl

milliseconds, I am fairly sure. It is much faster than I anticipated. Are you warming all
the sort- and facet-fields?

> Waiting for a full GC would take a long time.

Until you have fixed the core memory issue, you might consider doing an explicit GC every
night to clean up and hope that it does not occur automatically at daytime (or whenever your
clients uses it).

> Unfortunately I don't know of a way to provoke a full GC on command.

VisualVM, which is delivered with the Oracle JDK (look somewhere in the bin folder), is your
friend. Just start it on the server and click on the relevant process.

Regards,
Toke Eskildsen
Mime
View raw message