lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Nielsen>
Subject Re: Solr using a ridiculous amount of memory
Date Thu, 18 Apr 2013 09:59:16 GMT
> >
> [...]
> > &fq=site_guid%3a(10217)
> This constraints to hits to a specific customer, right? Any search will
> only be in a single customer's data?

Yes, thats right. No search from any given client ever returns anything
from another client.

[Toke: Are you warming all the sort- and facet-fields?]
> > I'm sorry, I don't know. I have the field value cache commented out in
> > my config, so... Whatever is default?
> (a bit shaky here) I would say not warming. You could check simply by
> starting solr and looking at the caches before you issue any searches.

The field cache shows 0 entries at startup. On the running server, forcing
a commit (and thus opening a new searcher) does not change the number of

> > The problem is that each item can have several sort orders. The sort
> > order to use is defined by a group number which is known ahead of
> > time. The group number is included in the sort order field name. To
> > solve it in the same way i solved the facet problem, I would need to
> > be able to sort on a multi-valued field, and unless I'm wrong, I don't
> > think that it's possible.
> That is correct.
> Three suggestions off the bat:
> 1) Reduce the number of sort fields by mapping names.
> Count the maximum number of unique sort fields for any given customer.
> That will be the total number of sort fields in the index. For each
> group number for a customer, map that number to one of the index-wide
> sort fields.
> This only works if the maximum number of unique fields is low (let's say
> a single field takes 50MB, so 20 fields should be okay).

I just checked our DB. Our worst case scenario client has over a thousand
groups for sorting. Granted, it may be, probably is, an error with the
data. It is an interesting idea though and I will look into this posibility.

> 3) Switch to a layout where each customer has a dedicated core.
> The basic overhead is a lot larger than for a shared index, but it would
> make your setup largely immune to the adverse effect of many documents
> coupled with many facet- and sort-fields.

Now this is where my brain melts down.

If I understand the fieldCache mechanism correctly (which i can see that I
don't), the data used for faceting and sorting is saved in the fieldCache
using a key comprised of the fields used for said faceting/sorting. That
data only contains the data which is actually used for the operation. This
is what the fq queries are for.

So if i generate a core for each client, I would have a client specific
fieldCache containing the data from that client. Wouldn't I just split up
the same data into several cores?

I'm afraid I don't understand how this would help.

Med venlig hilsen / Best regards

*John Nielsen*

Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message