Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 39363 invoked from network); 17 Oct 2010 07:44:12 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 17 Oct 2010 07:44:12 -0000 Received: (qmail 50051 invoked by uid 500); 17 Oct 2010 07:44:11 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 49664 invoked by uid 500); 17 Oct 2010 07:44:08 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 49657 invoked by uid 99); 17 Oct 2010 07:44:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Oct 2010 07:44:07 +0000 X-ASF-Spam-Status: No, hits=2.8 required=10.0 tests=FREEMAIL_FROM,FREEMAIL_REPLYTO,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of simon.willnauer@googlemail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qy0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Oct 2010 07:44:01 +0000 Received: by qyk29 with SMTP id 29so2867573qyk.14 for ; Sun, 17 Oct 2010 00:43:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:received:reply-to :in-reply-to:references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=srwPiUqBVtxkiwu7FPheM4xlgBgN/cdaPFbRuFqVIOw=; b=U8cUdvePb7Wm39hET1ctGKVdtnddCfZRkpOewykcsz1MWTvhK2udCmYaH9KNe0epV1 aeY0jsjea4b+KuLsG81Hnbogd2XjunIQeOmRjBJnA6lj3lCfjRt+PqSxzLuTSgIQ3VSy yPAVtPPTBQmtx4s0tAAMuyLI6Cs8P0qV7Gl/Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type:content-transfer-encoding; b=AZiYi5D7+oAsaafQgcPCq8mH6w8disXqf9FbxEWathTC4jspPWvZ0I0qfY87cBAvIf ZoQoZkpnb4iTTjs1tVIvxzrARVjeoy53KbpEBWyaO0ajwUwR49Plu3nBhXjW6uwHnriL /XzfZe2hjrV83VdFNEB1q82oYPYy1YeNNbP28= MIME-Version: 1.0 Received: by 10.229.82.85 with SMTP id a21mr1595750qcl.71.1287301419231; Sun, 17 Oct 2010 00:43:39 -0700 (PDT) Received: by 10.229.84.78 with HTTP; Sun, 17 Oct 2010 00:43:39 -0700 (PDT) Reply-To: simon.willnauer@gmail.com In-Reply-To: References: <29605587.160301287146552597.JavaMail.jira@thor> Date: Sun, 17 Oct 2010 09:43:39 +0200 Message-ID: Subject: Re: [jira] Commented: (SOLR-1311) pseudo-field-collapsing From: Simon Willnauer To: dev@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Sat, Oct 16, 2010 at 11:31 PM, Lance Norskog wrote: > The "Field Collapsing" patch is dead. "Search Grouping" is a different > suite of techniques that the committers are willing to commit. Note > that the Field Collapsing issue has been open for 3+ years and nothing > was ever committed: the Solr committers who care all hate it. Lance, what you are saying might be true or it may not, but in either case it's no way judge a design and / or work done over 3+ years. Folks have made their proposals with certain requirements in mind and with good faith. If the issue has major problems or downsides over the other approach, point them out and give folks an idea of the differences instead of judging work people have done without giving any good reasons. I am sure that you didn't meant to insult anybody though but IMO you phrasing was very unfortunate in that case. Lets keep things constructive here! simon > > 8G is not a big index. 450G is a big index. 1.5 billion docs is a big > index. The greybeards won't touch a structural change that doesn't > work for the wide range of use cases. The Field Collapsing patches > never scaled. > > On Fri, Oct 15, 2010 at 5:42 AM, Marc Sturlese (JIRA) w= rote: >> >> =C2=A0 =C2=A0[ https://issues.apache.org/jira/browse/SOLR-1311?page=3Dco= m.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedComme= ntId=3D12921328#action_12921328 ] >> >> Marc Sturlese commented on SOLR-1311: >> ------------------------------------- >> >> Well I said it can not be integrated as a plugin because it hacks DocLis= tAndSetNC and DocListNC. This 2 functions just can be altered altering the = SolrIndexSearcher.java class. >> The pseudo-field-collapse sort is not included in the current field coll= apsing but current field collapsing seems to perform much better that it us= e to (I don't think as good as this patch, but the current feature is much = more complete than my patch). >> I supose I can close it. >> >>> pseudo-field-collapsing >>> ----------------------- >>> >>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Key: SOLR-1311 >>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 URL: https://is= sues.apache.org/jira/browse/SOLR-1311 >>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Project: Solr >>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Issue Type: New Feature >>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Components: search >>> =C2=A0 =C2=A0Affects Versions: 1.4 >>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Reporter: Marc Sturlese >>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Fix For: Next >>> >>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 Attachments: SOLR-1311-pseudo-field-collaps= ing.patch >>> >>> >>> I am trying to develope a new way of doing field collapsing based on th= e adjacent field collapsing algorithm. I have started developing it beacuse= I am experiencing performance problems with the field collapsing patch wit= h big index (8G). >>> The algorith does adjacent-pseudo-field collapsing. It does collapsing = on the first X documents. Instead of making the collapsed docs disapear, th= e algorith will send them to a given position of the relevance results list= . >>> The reason I just do collapsing in the first X documents is that if I h= ave for example 600000 results and I am showing 10 results per page, I real= ly don't need to do collapsing in the page 30000 or even not in the 3000. D= oing this I am noticing dramatically better performance. The problem is I c= ouldn't find a way to plug the algorithm as a component and keep good perfo= rmance. I had to hack few classes in SolrIndexSearcher.java >>> This patch is just experimental and for testing purposes. In case someo= ne finds it interesting would be good do find a way to integrate it in a be= tter way than it is at the moment. >>> Advices are more than welcome. >>> >>> Functionality: >>> In solrconfig.xml we specify the pseudo-collapsing parameters: >>> =C2=A0 =C2=A0 =C2=A0true >>> =C2=A0 =C2=A0 =C2=A03000 >>> =C2=A0 =C2=A0 =C2=A0name >>> (at the moment there's no threshold and other parameters that exist in = the current collapse-field patch) >>> plus.considerMoreDocs one enables pseudo-collapsing >>> plus.considerHowMany sets the number of resultant documents in wich we = want to apply the algorithm >>> plus.considerField is the field to do pseudo-collapsing >>> If the number of results is lower than plus.considerHowMany the algorit= hm will be applyed to all the results. >>> Let's say there is a query with 600000 results and we've set considerHo= wMany to 3000 (and we already have the docs sorted by relevance). >>> What adjacent-pseudo-collapse does is, if the 2nd doc has to be collaps= ed it will be sent to the pos 2999 of the relevance results array. If the 3= th has to be collpased too =C2=A0will go to the position 2998 and successiv= ely like this. >>> The algorithm is not applyed when a sortspec is set or plus.considerMor= eDocs is set to false. It neighter is applyed when using MoreLikeThisReques= tHanlder. >>> Example with a query of 9 results: >>> Results sorted by relevance without pseudo-collapse-algorithm: >>> doc1 - collapse_field_value 3 >>> doc2 - collapse_field_value 3 >>> doc3 - collapse_field_value 4 >>> doc4 - collapse_field_value 7 >>> doc5 - collapse_field_value 6 >>> doc6 - collapse_field_value 6 >>> doc7 - collapse_field_value 5 >>> doc8 - collapse_field_value 1 >>> doc9 - collapse_field_value 2 >>> Results pseudo-collapsed with plus.considerHowMany =3D 5 >>> doc1 - collapse_field_value 3 >>> doc3 - collapse_field_value 4 >>> doc4 - collapse_field_value 7 >>> doc5 - collapse_field_value 6 >>> doc2 - collapse_field_value 3* >>> doc6 - collapse_field_value 6 >>> doc7 - collapse_field_value 5 >>> doc8 - collapse_field_value 1 >>> doc9 - collapse_field_value 2 >>> Results pseudo-collapsed with plus.considerHowMany =3D 9 >>> doc1 - collapse_field_value 3 >>> doc3 - collapse_field_value 4 >>> doc4 - collapse_field_value 7 >>> doc5 - collapse_field_value 6 >>> doc7 - collapse_field_value 5 >>> doc8 - collapse_field_value 1 >>> doc9 - collapse_field_value 2 >>> doc6 - collapse_field_value 6* >>> doc2 - collapse_field_value 3* >>> *pseudo-collapsed documents >> >> -- >> This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue online. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org >> For additional commands, e-mail: dev-help@lucene.apache.org >> >> > > > > -- > Lance Norskog > goksron@gmail.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: dev-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org