lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Nielsen ...@mcb.dk>
Subject Re: Solr using a ridiculous amount of memory
Date Wed, 17 Apr 2013 11:39:46 GMT
> I am surprised about the lack of "UnInverted" from your logs as it is
logged on INFO level.

Nope, no trace of it. No mention either in Logging -> Level from the admin
interface.

> It should also be available from the admin interface under
collection/Plugin / Stats/CACHE/fieldValueCache.

I never seriously looked at my fieldValueCache. It never seemed to get used:

http://screencast.com/t/YtKw7UQfU

> You stated that you were unable to make a 4GB JVM OOM when you just
performed faceting (I guesstimate that it will also run fine with just ½GB
or at least with 1GB, based on the
> numbers above) and you have observed that the field cache eats the
memory.

Yep. We still do a lot of sorting on dynamic field names, so the field
cache has a lot of entries. (9.411 entries as we speak. This is
considerably lower than before.). You mentioned in an earlier mail that
faceting on a field shared between all facet queries would bring down the
memory needed. Does the same thing go for sorting? Does those 9411 entries
duplicate data between them? If this is where all the memory is going, I
have a lot of coding to do.

> Guessing wildly: Do you issue a high frequency small updates with
frequent commits? If you pause the indexing, does memory use fall back to
the single GB level

I do commit a bit more often than i should. I get these in my log file from
time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2 The way I
understand this is that two searchers are being warmed at the same time and
that one will be discarded when it finishes its auto warming procedure. If
the math above is correct, I would need tens of searchers auto
warming in parallel to cause my problem. If I misunderstand how this works,
do let me know.

My indexer has a cleanup routine that deletes replay logs and other things
when it has nothing to do. This includes running a commit on the solr
server to make sure nothing is ever in a state where something is not
written to disk anywhere. In theory it can commit once every 60 seconds,
though i doubt that ever happenes. The less work the indexer has, the more
often it commits. (yes i know, its on my todo list)

Other than that, my autocommit settings look like this:

<autoCommit> <maxTime>60000</maxTime> <maxDocs>6000</maxDocs>
<openSearcher>
false</openSearcher> </autoCommit>

The control panel says that the warm up time of the last searcher is 5574.
Is that seconds or milliseconds?
http://screencast.com/t/d9oIbGLCFQwl

I would prefer to not turn off the indexer unless the numbers above
suggests that I really should try this. Waiting for a full GC would take a
long time. Unfortunately I don't know of a way to provoke a full GC on
command.


On Wed, Apr 17, 2013 at 11:48 AM, Toke Eskildsen <te@statsbiblioteket.dk>wrote:

> John Nielsen [jn@mcb.dk] wrote:
> > I managed to get this done. The facet queries now facets on a multivalue
> field as opposed to the dynamic field names.
>
> > Unfortunately it doesn't seem to have done much difference, if any at
> all.
>
> I am sorry to hear that.
>
> > documents = ~1.400.000
> > references 11.200.000  (we facet on two multivalue fields with each 4
> values
> > on average, so 1.400.000 * 2 * 4 = 11.200.000
> > unique values = 1.132.344 (total number of variant options across all
> clients.
> > This is what we facet on)
>
> > 1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per
> field (we have 4 fields)?
>
> > I must be calculating this wrong.
>
> No, that sounds about right. In reality you need to multiply with 3 or 4,
> so let's round to 50MB/field: 1.4M documents with 2 fields with 5M
> references/field each is not very much and should not take a lot of memory.
> In comparison, we facet on 12M documents with 166M references and do some
> other stuff (in Lucene with a different faceting implementation, but at
> this level it is equivalent to Solr's in terms of memory). Our heap is 3GB.
>
> I am surprised about the lack of "UnInverted" from your logs as it is
> logged on INFO level. It should also be available from the admin interface
> under collection/Plugin / Stats/CACHE/fieldValueCache. But I am guessing
> you got your numbers from that and that the list only contains the few
> facets you mentioned previously? It might be wise to sanity check by
> summing the memSizes though; they ought to take up far below 1GB.
>
> From your description, your index is small and your faceting requirements
> modest. A SSD-equipped laptop should be adequate as server. So we are back
> to "math does not check out".
>
>
> You stated that you were unable to make a 4GB JVM OOM when you just
> performed faceting (I guesstimate that it will also run fine with just ½GB
> or at least with 1GB, based on the numbers above) and you have observed
> that the field cache eats the memory. This does indicate that the old
> caches are somehow not freed when the index is updated. That is strange as
> Solr should take care of that automatically.
>
> Guessing wildly: Do you issue a high frequency small updates with frequent
> commits? If you pause the indexing, does memory use fall back to the single
> GB level (You probably need to trigger a full GC to check that)? If that is
> the case, it might be a warmup problem with old warmups still running when
> new commits are triggered.
>
> Regards,
> Toke Eskildsen, State and University Library, Denmark




-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
post@mcb.dk
www.mcb.dk

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message