lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Nielsen ...@mcb.dk>
Subject Re: Solr using a ridiculous amount of memory
Date Thu, 18 Apr 2013 06:34:35 GMT
> That was strange. As you are using a multi-valued field with the new
setup, they should appear there.

Yes, the new field we use for faceting is a multi valued field.

> Can you find the facet fields in any of the other caches?

Yes, here it is, in the field cache:

http://screencast.com/t/mAwEnA21yL

> I hope you are not calling the facets with facet.method=enum? Could you
paste a typical facet-enabled search request?

Here is a typical example (I added newlines for readability):

http://172.22.51.111:8000/solr/default1_Danish/search
?defType=edismax
&q=*%3a*
&facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_7+key%3ditemvariantoptions_int_mv_7%7ditemvariantoptions_int_mv
&facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_9+key%3ditemvariantoptions_int_mv_9%7ditemvariantoptions_int_mv
&facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_8+key%3ditemvariantoptions_int_mv_8%7ditemvariantoptions_int_mv
&facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_2+key%3ditemvariantoptions_int_mv_2%7ditemvariantoptions_int_mv
&fq=site_guid%3a(10217)
&fq=item_type%3a(PRODUCT)
&fq=language_guid%3a(1)
&fq=item_group_1522_combination%3a(*)
&fq=is_searchable%3a(True)
&sort=item_group_1522_name_int+asc, variant_of_item_guid+asc
&querytype=Technical
&fl=feed_item_serialized
&facet=true
&group=true
&group.facet=true
&group.ngroups=true
&group.field=groupby_variant_of_item_guid
&group.sort=name+asc
&rows=0

> Are you warming all the sort- and facet-fields?

I'm sorry, I don't know. I have the field value cache commented out in my
config, so... Whatever is default?

Removing the custom sort fields is unfortunately quite a bit more difficult
than my other facet modification.

The problem is that each item can have several sort orders. The sort order
to use is defined by a group number which is known ahead of time. The group
number is included in the sort order field name. To solve it in the same
way i solved the facet problem, I would need to be able to sort on a
multi-valued field, and unless I'm wrong, I don't think that it's possible.

I am quite stomped on how to fix this.




On Wed, Apr 17, 2013 at 3:06 PM, Toke Eskildsen <te@statsbiblioteket.dk>wrote:

> John Nielsen [jn@mcb.dk]:
> > I never seriously looked at my fieldValueCache. It never seemed to get
> used:
>
> > http://screencast.com/t/YtKw7UQfU
>
> That was strange. As you are using a multi-valued field with the new
> setup, they should appear there. Can you find the facet fields in any of
> the other caches?
>
> ...I hope you are not calling the facets with facet.method=enum? Could you
> paste a typical facet-enabled search request?
>
> > Yep. We still do a lot of sorting on dynamic field names, so the field
> cache
> > has a lot of entries. (9.411 entries as we speak. This is considerably
> lower
> > than before.). You mentioned in an earlier mail that faceting on a field
> > shared between all facet queries would bring down the memory needed.
> > Does the same thing go for sorting?
>
> More or less. Sorting stores the raw string representations (utf-8) in
> memory so the number of unique values has more to say than it does for
> faceting. Just as with faceting, a list of pointers from documents to
> values (1 value/document as we are sorting) is maintained, so the overhead
> is something like
>
> #documents*log2(#unique_terms*average_term_length) +
> #unique_terms*average_term_length
> (where average_term_length is in bits)
>
> Caveat: This is with the index-wide sorting structure. I am fairly
> confident that this is what Solr uses, but I have not looked at it lately
> so it is possible that some memory-saving segment-based trickery has been
> implemented.
>
> > Does those 9411 entries duplicate data between them?
>
> Sorry, I do not know. SOLR-1111 discusses the problems with the field
> cache and duplication of data, but I cannot infer if it is has been solved
> or not. I am not familiar with the stat breakdown of the fieldCache, but it
> _seems_ to me that there are 2 or 3 entries for each segment for each sort
> field. Guesstimating further, let's say you have 30 segments in your index.
> Going with the guesswork, that would bring the number of sort fields to
> 9411/3/30 ~= 100. Looks like you use a custom sort field for each client?
>
> Extrapolating from 1.4M documents and 180 clients, let's say that there
> are 1.4M/180/5 unique terms for each sort-field and that their average
> length is 10. We thus have
> 1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB
> per sort field or about 4GB for all the 180 fields.
>
> With this few unique values, the doc->value structure is by far the
> biggest, just as with facets. As opposed to the faceting structure, this is
> fairly close to the actual memory usage. Switching to a single sort field
> would reduce the memory usage from 4GB to about 55MB.
>
> > I do commit a bit more often than i should. I get these in my log file
> from
> > time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2
>
> So 1 active searcher and 2 warming searchers. Ignoring that one of the
> warming searchers is highly likely to finish well ahead of the other one,
> that means that your heap must hold 3 times the structures for a single
> searcher. With the old heap size of 25GB that left "only" 8GB for a full
> dataset. Subtract the 4GB for sorting and a similar amount for faceting and
> you have your OOM.
>
> Tweaking your ingest to avoid 3 overlapping searchers will lower your
> memory requirements by 1/3. Fixing the facet & sorting logic will bring it
> down to laptop size.
>
> > The control panel says that the warm up time of the last searcher is
> 5574. Is that seconds or milliseconds?
> > http://screencast.com/t/d9oIbGLCFQwl
>
> milliseconds, I am fairly sure. It is much faster than I anticipated. Are
> you warming all the sort- and facet-fields?
>
> > Waiting for a full GC would take a long time.
>
> Until you have fixed the core memory issue, you might consider doing an
> explicit GC every night to clean up and hope that it does not occur
> automatically at daytime (or whenever your clients uses it).
>
> > Unfortunately I don't know of a way to provoke a full GC on command.
>
> VisualVM, which is delivered with the Oracle JDK (look somewhere in the
> bin folder), is your friend. Just start it on the server and click on the
> relevant process.
>
> Regards,
> Toke Eskildsen




-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
post@mcb.dk
www.mcb.dk

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message