Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0B4B0F115 for ; Thu, 18 Apr 2013 06:35:08 +0000 (UTC) Received: (qmail 8146 invoked by uid 500); 18 Apr 2013 06:35:04 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 7822 invoked by uid 500); 18 Apr 2013 06:35:03 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 7778 invoked by uid 99); 18 Apr 2013 06:35:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Apr 2013 06:35:01 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jn@mcb.dk designates 74.125.82.49 as permitted sender) Received: from [74.125.82.49] (HELO mail-wg0-f49.google.com) (74.125.82.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Apr 2013 06:34:57 +0000 Received: by mail-wg0-f49.google.com with SMTP id x12so4154wgg.28 for ; Wed, 17 Apr 2013 23:34:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:x-gm-message-state; bh=VOzg8G8a+D63EVK3x3VprajXYtWfpmn+G3waNDKOev0=; b=jxVaAE+6e3bsBogM9X3rV8WuR+TGUKMzddvNRRyDegb5qbXUvxK9kXpzE0KKMMTwx3 cCQzKTX9FONBNglwLregnKjZItWU0OtdwbnoYbhtS/KnKFvvWvksFGwUAZ0oJd274clB 7WqAdvJ4JDUYHCH8bCZzURwv94f9dQy2gZqBMeawGSUvAo4IVx+8QEEAireCklB06I4n ezSjdGlygZ6faWbIGhF/heSik5OAbdaxL/TdoQOpBKa8u0irEzxTL6OEeFFzYFRErV6C IpL6igJFrBy65Kn3+F5pzbg7w7w2ZrB7o98KlrTRemQL72oFlF+ca2LLUgs3LhN30EI4 dX5w== MIME-Version: 1.0 X-Received: by 10.194.142.236 with SMTP id rz12mr16216063wjb.12.1366266875968; Wed, 17 Apr 2013 23:34:35 -0700 (PDT) Received: by 10.194.89.100 with HTTP; Wed, 17 Apr 2013 23:34:35 -0700 (PDT) In-Reply-To: <2E6A89A648463A4EBF093A9062C16683046532C5FA99@SBMAILBOX1.sb.statsbiblioteket.dk> References: <1366009259.3998.329.camel@te-prime> <1366025913.3998.387.camel@te-prime> <2E6A89A648463A4EBF093A9062C16683046532C5FA96@SBMAILBOX1.sb.statsbiblioteket.dk> <2E6A89A648463A4EBF093A9062C16683046532C5FA99@SBMAILBOX1.sb.statsbiblioteket.dk> Date: Thu, 18 Apr 2013 08:34:35 +0200 Message-ID: Subject: Re: Solr using a ridiculous amount of memory From: John Nielsen To: Toke Eskildsen Cc: "solr-user@lucene.apache.org" Content-Type: multipart/alternative; boundary=089e0122e7f8cfbab004da9ccbe9 X-Gm-Message-State: ALoCoQkjGrRN21Z/6XgQINkzfkQ9urpvDxz+SYVKzcE2OubMthPGCNjCcHaV7L5tJjjwFrf5lByF X-Virus-Checked: Checked by ClamAV on apache.org --089e0122e7f8cfbab004da9ccbe9 Content-Type: text/plain; charset=ISO-8859-1 > That was strange. As you are using a multi-valued field with the new setup, they should appear there. Yes, the new field we use for faceting is a multi valued field. > Can you find the facet fields in any of the other caches? Yes, here it is, in the field cache: http://screencast.com/t/mAwEnA21yL > I hope you are not calling the facets with facet.method=enum? Could you paste a typical facet-enabled search request? Here is a typical example (I added newlines for readability): http://172.22.51.111:8000/solr/default1_Danish/search ?defType=edismax &q=*%3a* &facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_7+key%3ditemvariantoptions_int_mv_7%7ditemvariantoptions_int_mv &facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_9+key%3ditemvariantoptions_int_mv_9%7ditemvariantoptions_int_mv &facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_8+key%3ditemvariantoptions_int_mv_8%7ditemvariantoptions_int_mv &facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_2+key%3ditemvariantoptions_int_mv_2%7ditemvariantoptions_int_mv &fq=site_guid%3a(10217) &fq=item_type%3a(PRODUCT) &fq=language_guid%3a(1) &fq=item_group_1522_combination%3a(*) &fq=is_searchable%3a(True) &sort=item_group_1522_name_int+asc, variant_of_item_guid+asc &querytype=Technical &fl=feed_item_serialized &facet=true &group=true &group.facet=true &group.ngroups=true &group.field=groupby_variant_of_item_guid &group.sort=name+asc &rows=0 > Are you warming all the sort- and facet-fields? I'm sorry, I don't know. I have the field value cache commented out in my config, so... Whatever is default? Removing the custom sort fields is unfortunately quite a bit more difficult than my other facet modification. The problem is that each item can have several sort orders. The sort order to use is defined by a group number which is known ahead of time. The group number is included in the sort order field name. To solve it in the same way i solved the facet problem, I would need to be able to sort on a multi-valued field, and unless I'm wrong, I don't think that it's possible. I am quite stomped on how to fix this. On Wed, Apr 17, 2013 at 3:06 PM, Toke Eskildsen wrote: > John Nielsen [jn@mcb.dk]: > > I never seriously looked at my fieldValueCache. It never seemed to get > used: > > > http://screencast.com/t/YtKw7UQfU > > That was strange. As you are using a multi-valued field with the new > setup, they should appear there. Can you find the facet fields in any of > the other caches? > > ...I hope you are not calling the facets with facet.method=enum? Could you > paste a typical facet-enabled search request? > > > Yep. We still do a lot of sorting on dynamic field names, so the field > cache > > has a lot of entries. (9.411 entries as we speak. This is considerably > lower > > than before.). You mentioned in an earlier mail that faceting on a field > > shared between all facet queries would bring down the memory needed. > > Does the same thing go for sorting? > > More or less. Sorting stores the raw string representations (utf-8) in > memory so the number of unique values has more to say than it does for > faceting. Just as with faceting, a list of pointers from documents to > values (1 value/document as we are sorting) is maintained, so the overhead > is something like > > #documents*log2(#unique_terms*average_term_length) + > #unique_terms*average_term_length > (where average_term_length is in bits) > > Caveat: This is with the index-wide sorting structure. I am fairly > confident that this is what Solr uses, but I have not looked at it lately > so it is possible that some memory-saving segment-based trickery has been > implemented. > > > Does those 9411 entries duplicate data between them? > > Sorry, I do not know. SOLR-1111 discusses the problems with the field > cache and duplication of data, but I cannot infer if it is has been solved > or not. I am not familiar with the stat breakdown of the fieldCache, but it > _seems_ to me that there are 2 or 3 entries for each segment for each sort > field. Guesstimating further, let's say you have 30 segments in your index. > Going with the guesswork, that would bring the number of sort fields to > 9411/3/30 ~= 100. Looks like you use a custom sort field for each client? > > Extrapolating from 1.4M documents and 180 clients, let's say that there > are 1.4M/180/5 unique terms for each sort-field and that their average > length is 10. We thus have > 1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB > per sort field or about 4GB for all the 180 fields. > > With this few unique values, the doc->value structure is by far the > biggest, just as with facets. As opposed to the faceting structure, this is > fairly close to the actual memory usage. Switching to a single sort field > would reduce the memory usage from 4GB to about 55MB. > > > I do commit a bit more often than i should. I get these in my log file > from > > time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2 > > So 1 active searcher and 2 warming searchers. Ignoring that one of the > warming searchers is highly likely to finish well ahead of the other one, > that means that your heap must hold 3 times the structures for a single > searcher. With the old heap size of 25GB that left "only" 8GB for a full > dataset. Subtract the 4GB for sorting and a similar amount for faceting and > you have your OOM. > > Tweaking your ingest to avoid 3 overlapping searchers will lower your > memory requirements by 1/3. Fixing the facet & sorting logic will bring it > down to laptop size. > > > The control panel says that the warm up time of the last searcher is > 5574. Is that seconds or milliseconds? > > http://screencast.com/t/d9oIbGLCFQwl > > milliseconds, I am fairly sure. It is much faster than I anticipated. Are > you warming all the sort- and facet-fields? > > > Waiting for a full GC would take a long time. > > Until you have fixed the core memory issue, you might consider doing an > explicit GC every night to clean up and hope that it does not occur > automatically at daytime (or whenever your clients uses it). > > > Unfortunately I don't know of a way to provoke a full GC on command. > > VisualVM, which is delivered with the Oracle JDK (look somewhere in the > bin folder), is your friend. Just start it on the server and click on the > relevant process. > > Regards, > Toke Eskildsen -- Med venlig hilsen / Best regards *John Nielsen* Programmer *MCB A/S* Enghaven 15 DK-7500 Holstebro Kundeservice: +45 9610 2824 post@mcb.dk www.mcb.dk --089e0122e7f8cfbab004da9ccbe9--