lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject RE: OutOfMemory on 28 docs with facet.method=fc/fcs
Date Tue, 18 Nov 2014 20:33:36 GMT
Mohsin Beg Beg [mohsin.beg@oracle.com] wrote:
> I am getting OOM when faceting on numFound=28. The receiving
> solr node throws the OutOfMemoryError even though there is 7gb
> available heap before the faceting request was submitted.

fc and fcs faceting memory overhead is (nearly) independent on the number of hits in the search
result. 

> If a different solr node is selected that one fails too. Any suggestions ?

> &facet.field=field1....field15
> &f.field1...field15.facet.method=fc/fcs
> &collection=Collection1...Collection100

You seem to be issuing a facet request for 15 fields in 100 collection concurrently. The memory
overhead will be linear to the number of documents, references from documents to field values
and the number of unique values in your facets, for each facet independently.

That was confusing. Let me try an example instead:

For each field, static memory requirements will be a structure that maps from documents to
term ordinals. Depending on circumstances, this can be small (DocValues and a numeric field)
or big (multi-value, non-DocValue String). Each concurrent call will temporarily allocate
a structure for counting. If the field is numeric, this will be a hashmap. If it is String,
it will be an integer-array with as many entries as there are unique values: If there are
1M unique String values in the field, the overhead will be 4 bytes * 1M = 4MB.

So, if each field has 250K unique String values, the temporary overhead for all 15 fields
will be 15MB. I don't now if the request for multiple collections is threaded, but if so,
the 15MB should be multiplied with 100, totalling 1.5GB memory overhead for each call. Add
the static structures and it does not seem unreasonable that you run out of memory.

All this is very loose, but the overall message is that documents, unique facet values, facets
and collections all multiplies memory requirements.

* Do you need to query all collections at once?
* Can you collapse some of the facet fields, to reduce the total number?
* Are some of the fields very small? If so, use enum for them instead of fc/fcs.
* Maybe you can determine your limits by issuing requests first for 1 field, then 2 etc. This
is to see if it is feasible to do minor tweak to get it to work or if your setup is so large
that something entirely else needs to be done.

- Toke Eskildsen

Mime
View raw message