Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 54E7710BD6 for ; Wed, 15 Jan 2014 12:26:10 +0000 (UTC) Received: (qmail 22204 invoked by uid 500); 15 Jan 2014 12:26:06 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 21931 invoked by uid 500); 15 Jan 2014 12:26:05 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 21923 invoked by uid 99); 15 Jan 2014 12:26:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jan 2014 12:26:04 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of joelsolr@gmail.com designates 209.85.223.170 as permitted sender) Received: from [209.85.223.170] (HELO mail-ie0-f170.google.com) (209.85.223.170) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jan 2014 12:25:58 +0000 Received: by mail-ie0-f170.google.com with SMTP id u16so57322iet.29 for ; Wed, 15 Jan 2014 04:25:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Baztnmc+nZYni5yMFaIgU7EuQXIQ7fWzjIRYqTxgG+M=; b=YNLv6GHAx++9HsZPsaeiH0hGyBtzralWov4PW3vPGlGImhBGt4zlicr4rkcSynhKca 6Tnb+Vxs3xNOdjwmScqOmJ3MvGWsd4TpQsIu4qKP4z+nVCjgX8wQlGw1GOHKxzF3JtX4 h1nJZ5dj2/uL7LhOYaN5Wt8qEB91coiYd20kWQ4nMQ5k74Pwd+JaqT/2XZc6StdaYNtu tlrxeDFtmdxG70UGFZTUHWZLNsy/8GGO8x8JaNHQ311iRGu6qLTFt0iwsiTx2NLt2sO5 Wu+EQiDVAmOm+l6csBtvMm00qOBhqRkMVXp2kjxusMn4q7dpQ1YxBijrgXQX8IYlqR74 Z7zg== MIME-Version: 1.0 X-Received: by 10.50.60.4 with SMTP id d4mr2353851igr.14.1389788737495; Wed, 15 Jan 2014 04:25:37 -0800 (PST) Received: by 10.42.84.13 with HTTP; Wed, 15 Jan 2014 04:25:37 -0800 (PST) In-Reply-To: <1389771325719-4111375.post@n3.nabble.com> References: <1389771325719-4111375.post@n3.nabble.com> Date: Wed, 15 Jan 2014 07:25:37 -0500 Message-ID: Subject: Re: SolrCloud Result Grouping vs CollapsingQParserPlugin From: Joel Bernstein To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7b10ca5302eb5f04f00168df X-Virus-Checked: Checked by ClamAV on apache.org --047d7b10ca5302eb5f04f00168df Content-Type: text/plain; charset=ISO-8859-1 "During query time, depending on the query, results can be returned from both shards. For e.g. a query q=solr&group=true&group.field=adskdedup&group.ngroups=true would ideally return data from both shards and apply the grouping on shard1 based on adskdedup field. This will also ensure that group.ngroups=true will return the right count." This is correct and will work with standard grouping and the CollapsingQParserPlugin. "The other clarification I wanted was based on this statement : "When a tenant is too large to fit on a single shard it can be spread across multiple shards be specifying the number of bits to use from the shard key." If we split shards, will Result Grouping / CollapsingQParserPlugin and number of results still work ?" With field collapsing you'll need to keep all group docs on the same shard, so you won't be able to specify the number of bits. "Last but not the least, when are you planning to release 4.6.1 ?" There is a thread going on the dev list about the 4.6.1 release. You can follow progress at: http://markmail.org/search/?q=%22Lucene+%2F+Solr+4.6.1%22 Joel Bernstein Search Engineer at Heliosearch On Wed, Jan 15, 2014 at 2:35 AM, shamik wrote: > Joel, > > Thanks for the pointer. I went through your blog on Document routing, > very > informative. I do need some clarifications on the implementation. I'll try > to run it based on my use case. > > I'm indexing documents from multiple source system out of which a bunch > consist of duplicate content. I'm trying to remove them by applying result > grouping / CollapsingQParserPlugin. For e.g. lets say I've source ABC, MNO > and XYZ. Now, ABC and MNO source contains the duplicate documents, which is > identified by a field say adskdedup. I've couple of shards, the id being > the > url of the documents. Now, to make field collapsing work, I need to update > the id field to include "adskdedup!url" . Documents having identical > adskdedup values should route to a dedicated shard , e.g. shard1. The ones > which are not identical will be routed to either Shard1 or Shard2. After > the > indexing is done, shard1 should have all documents on which grouping needs > to be applied upon. > > During query time, depending on the query, results can be returned from > both > shards. For e.g. a query > q=solr&group=true&group.field=adskdedup&group.ngroups=true would ideally > return data from both shards and apply the grouping on shard1 based on > adskdedup field. This will also ensure that group.ngroups=true will return > the right count. > > The other clarification I wanted was based on this statement : "When a > tenant is too large to fit on a single shard it can be spread across > multiple shards be specifying the number of bits to use from the shard > key." > If we split shards, will Result Grouping / CollapsingQParserPlugin and > number of results still work ? > > Last but not the least, when are you planning to release 4.6.1 ? > > Again, appreciate your help on this. > > - Thanks > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SolrCloud-Result-Grouping-vs-CollapsingQParserPlugin-tp4111331p4111375.html > Sent from the Solr - User mailing list archive at Nabble.com. > --047d7b10ca5302eb5f04f00168df--