lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject RE: Solr using a ridiculous amount of memory
Date Wed, 17 Apr 2013 09:48:24 GMT
John Nielsen [jn@mcb.dk] wrote:
> I managed to get this done. The facet queries now facets on a multivalue field as opposed
to the dynamic field names.

> Unfortunately it doesn't seem to have done much difference, if any at all.

I am sorry to hear that.

> documents = ~1.400.000
> references 11.200.000  (we facet on two multivalue fields with each 4 values 
> on average, so 1.400.000 * 2 * 4 = 11.200.000
> unique values = 1.132.344 (total number of variant options across all clients.
> This is what we facet on)

> 1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per field (we have 4
fields)?

> I must be calculating this wrong.

No, that sounds about right. In reality you need to multiply with 3 or 4, so let's round to
50MB/field: 1.4M documents with 2 fields with 5M references/field each is not very much and
should not take a lot of memory. In comparison, we facet on 12M documents with 166M references
and do some other stuff (in Lucene with a different faceting implementation, but at this level
it is equivalent to Solr's in terms of memory). Our heap is 3GB.

I am surprised about the lack of "UnInverted" from your logs as it is logged on INFO level.
It should also be available from the admin interface under collection/Plugin / Stats/CACHE/fieldValueCache.
But I am guessing you got your numbers from that and that the list only contains the few facets
you mentioned previously? It might be wise to sanity check by summing the memSizes though;
they ought to take up far below 1GB.

>From your description, your index is small and your faceting requirements modest. A SSD-equipped
laptop should be adequate as server. So we are back to "math does not check out".


You stated that you were unable to make a 4GB JVM OOM when you just performed faceting (I
guesstimate that it will also run fine with just ½GB or at least with 1GB, based on the numbers
above) and you have observed that the field cache eats the memory. This does indicate that
the old caches are somehow not freed when the index is updated. That is strange as Solr should
take care of that automatically.

Guessing wildly: Do you issue a high frequency small updates with frequent commits? If you
pause the indexing, does memory use fall back to the single GB level (You probably need to
trigger a full GC to check that)? If that is the case, it might be a warmup problem with old
warmups still running when new commits are triggered.

Regards,
Toke Eskildsen, State and University Library, Denmark
Mime
View raw message