lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject Re: Solr using a ridiculous amount of memory
Date Thu, 18 Apr 2013 08:08:32 GMT
On Thu, 2013-04-18 at 08:34 +0200, John Nielsen wrote:

> 
[Toke: Can you find the facet fields in any of the other caches?]

> Yes, here it is, in the field cache:

> http://screencast.com/t/mAwEnA21yL
> 
Ah yes, mystery solved, my mistake.

> http://172.22.51.111:8000/solr/default1_Danish/search

[...]

> &fq=site_guid%3a(10217)

This constraints to hits to a specific customer, right? Any search will
only be in a single customer's data?

> 
[Toke: Are you warming all the sort- and facet-fields?]

> I'm sorry, I don't know. I have the field value cache commented out in
> my config, so... Whatever is default?

(a bit shaky here) I would say not warming. You could check simply by
starting solr and looking at the caches before you issue any searches.

This fits the description of your searchers gradually eating memory
until your JVM OOMs. Each time a new field is faceted or sorted upon, it
it added to the cache. As your index is relatively small and the number
of values in the single fields is small, the initialization time for a
field is so short that it is not a performance problem. Memory wise is
is death by a thousand cuts.

If you did explicit warming of all the possible fields for sorting and
faceting, your would allocate it all up front and would be sure that
there would be enough memory available. But it would take much longer
than your current setup. You might want to try it out (no need to fiddle
with Solr setup, just make a script and fire wgets as this has the same
effect).

> The problem is that each item can have several sort orders. The sort
> order to use is defined by a group number which is known ahead of
> time. The group number is included in the sort order field name. To
> solve it in the same way i solved the facet problem, I would need to
> be able to sort on a multi-valued field, and unless I'm wrong, I don't
> think that it's possible.

That is correct.

Three suggestions off the bat:

1) Reduce the number of sort fields by mapping names.
Count the maximum number of unique sort fields for any given customer.
That will be the total number of sort fields in the index. For each
group number for a customer, map that number to one of the index-wide
sort fields.
This only works if the maximum number of unique fields is low (let's say
a single field takes 50MB, so 20 fields should be okay).

2) Create a custom sorter for Solr.
Create a field with all the sort values, prefixed by group ID. Create a
structure (or reuse the one from Lucene) with a doc->terms map with all
the terms in-memory. When sorting, extract the relevant compare-string
for a document by iterating all the terms for the document and selecting
the one with the right prefix.
Memory wise this scales linear to the number of terms instead of the
number of fields, but it would require quite some coding.

3) Switch to a layout where each customer has a dedicated core.
The basic overhead is a lot larger than for a shared index, but it would
make your setup largely immune to the adverse effect of many documents
coupled with many facet- and sort-fields.

- Toke Eskildsen, State and University Library, Denmark



Mime
View raw message