Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB7C9C6EB for ; Wed, 17 Jul 2013 11:45:06 +0000 (UTC) Received: (qmail 34622 invoked by uid 500); 17 Jul 2013 11:45:02 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 34569 invoked by uid 500); 17 Jul 2013 11:45:02 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 34463 invoked by uid 99); 17 Jul 2013 11:45:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Jul 2013 11:45:01 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [62.2.164.170] (HELO mail.imagic.ch) (62.2.164.170) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Jul 2013 11:44:54 +0000 Received: from SRVMX01.imagic.local ([2002:c064:6444::c064:6444]) by srvmx01.imagic.local ([2002:c064:6444::c064:6444]) with mapi id 14.02.0342.003; Wed, 17 Jul 2013 13:44:33 +0200 From: Sandro Zbinden To: "solr-user@lucene.apache.org" Subject: Avoid Solr Pivot Faceting Out of Memory / Shorter result for pivot faceting requests with facet.pivot.ngroup=true and facet.pivot.showLastList=false Thread-Topic: Avoid Solr Pivot Faceting Out of Memory / Shorter result for pivot faceting requests with facet.pivot.ngroup=true and facet.pivot.showLastList=false Thread-Index: Ac6C4mTLY/JJUM1YQw6wLat6fYU2fQ== Date: Wed, 17 Jul 2013 11:44:32 +0000 Message-ID: <7CB959991CED524889ED7F553938464368723CD2@srvmx01.imagic.local> Accept-Language: de-DE, de-CH, en-US Content-Language: de-DE X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [2002:c064:6476::c064:6476] Content-Type: multipart/alternative; boundary="_000_7CB959991CED524889ED7F553938464368723CD2srvmx01imagiclo_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_7CB959991CED524889ED7F553938464368723CD2srvmx01imagiclo_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Dear Usergroup I am getting an out of memory exception in the following scenario. I have 4 sql tables: patient, visit, study and image that will be denormali= zed for the solr index The solr index looks like the following -------------------------------------------- |p_id |p_lastname|v_id |v_name |... -------------------------------------------- | 1 | Miller | 10 | Study 1 |... | 2 | Miller | 11 | Study 2 |... | 2 | Miller | 12 | Study 3 |... <-- Duplication bec= ause of denormalization | 3 | Smith | 13 | Study 4 |... ---------------------------------- Now I am executing a facet query q=3D*:*&facet=3Dtrue &facet.pivot=3Dp_lastname,p_id &facet.limit=3D-1 And I get the following result p_lastname Miller 3 p_id 1 1 p_id 2 2 p_lastname Smith 1 p_id 3 1 The goal is to show our clients a list of the group value and in parenthese= s how many patients the group contains. - Miller (2) - Smith (1) This is why we need to use the facet.pivot method with facet.limit-1. It is= as far as I know the only way to get a grouping for 2 criterias. And we need the pivot list to count how many patients are in a group. Currently this works good on smaller indexes but if we have arround 1'000'0= 00 patients and we execute a query like the one above we run in an out of m= emory. I figured out that the problem is not the calculation of the pivot but is t= he presentation of the result. Because we load all fields (we can not us facet.offset because we need to o= rder the results ascending and descending) the result can get really big. To avoid this overload I created a change in the solr-core PivotFacetHandle= r.java class. In the method doPivots i added the following code NamedList nl =3D this.getTermCounts(subField); pivot.add( "ngroups", nl.size()); This will give me the group size of the list. Then I removed the recursion call pivot.add( "pivot", doPivots( nl, subFiel= d, nextField, fnames, subset) ); Like this my result looks like the following p_lastname Miller 3 2 p_lastname Smith 1> 1 My questions is now if there is already something planned like facet.pivot.= ngroup=3Dtrue and facet.pivot.showLastList=3Dfalse to improve the performan= ce of pivot faceting. Is there a chance we could get this into the solr code. I think it's a real= ly small change of the code but could improve the product enormous. Best Regards Sandro Zbinden --_000_7CB959991CED524889ED7F553938464368723CD2srvmx01imagiclo_--