Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 39DA8182BC for ; Thu, 11 Feb 2016 23:22:22 +0000 (UTC) Received: (qmail 20222 invoked by uid 500); 11 Feb 2016 23:22:17 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 20153 invoked by uid 500); 11 Feb 2016 23:22:17 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 20141 invoked by uid 99); 11 Feb 2016 23:22:17 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Feb 2016 23:22:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 57432C0873 for ; Thu, 11 Feb 2016 23:22:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.429 X-Spam-Level: *** X-Spam-Status: No, score=3.429 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_REPLY=1, HTML_MESSAGE=2, KAM_LIVE=1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id uK69KZbEvPpK for ; Thu, 11 Feb 2016 23:22:15 +0000 (UTC) Received: from mail-lb0-f177.google.com (mail-lb0-f177.google.com [209.85.217.177]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id E68EF20428 for ; Thu, 11 Feb 2016 23:22:14 +0000 (UTC) Received: by mail-lb0-f177.google.com with SMTP id x4so36873118lbm.0 for ; Thu, 11 Feb 2016 15:22:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=dwOXNQ+w2fNsyd5zfCoFU/8HpRfoTkY+8iXtzYmW6Vw=; b=LjpZQnsqa1pB3ehIZm06bRoLthdizDj0FD1cnJ6mAzkyHeTXSyixHOQmemy/rMyu97 j2uUU0QY2q6KxKHrmca3fvJ+7mEYKUbiZIAmEn7PThwz/f8Yb5m8drsFqbnaAgckbjxX mZGd3QoMMQnA4sApAWt3epXlotutmK5pNCfnf5T7H0a9BsHaExTc6482K/UWKmLMTAwG SBC7rXq8y9AxDFvAUK+QXH/2fz0uQlR7xIe7OFbYQNayDRcFdswFMDnwIE123LTSepdE Am2ukVtm0cPp18NLRjyRKhMjmVQFpyzAO+Eb4U6qHDAqIPvk/Yc0PnyszdjLKAo2hWJ/ ygkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=dwOXNQ+w2fNsyd5zfCoFU/8HpRfoTkY+8iXtzYmW6Vw=; b=chukrEhr3Whk3v1skNUYDl9nnT5WM9qMWy4WPfoKexM9qp+V3frLajWH/v5x8/Y67Y 5xjZPLnbBsx23isRxOlgA2s+6BlLsCq1bZpkOxVZGJ2w64evQzquP4Y0jhkAAwyDmtJW +8c4Z6BbjNGydwUYoSOffMgH3zgU6b1dRtgxitWlMly2i38MlMhFouicFUPtDjEwFRa+ seXW8V5XixlwFk9HozC4EiqymzC2wVPgTBLV49AogFgy8+pxDka8wS6/jPd8NZi/Rkjs MDXr2RoOGRQQWxSA/cnX+DRvqBqtNeTOxNO4oXdX/tDdo29QE1qoUii2CCHIjyekNdoR KQDg== X-Gm-Message-State: AG10YOSvkrAnY7Df3i907p6MLUCOiVAmZW7V9KuoRHN962S9Wwdh5Vl3EBaffPuuVaRrOvmE/TYFMgUHCYGdoQ== MIME-Version: 1.0 X-Received: by 10.112.13.8 with SMTP id d8mr19469588lbc.110.1455232933405; Thu, 11 Feb 2016 15:22:13 -0800 (PST) Received: by 10.25.134.212 with HTTP; Thu, 11 Feb 2016 15:22:13 -0800 (PST) In-Reply-To: References: Date: Thu, 11 Feb 2016 17:22:13 -0600 Message-ID: Subject: Re: Select distinct records From: Brian Narsi To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001a11c3b44e0fae5d052b86d26d --001a11c3b44e0fae5d052b86d26d Content-Type: text/plain; charset=UTF-8 In order to use the Collapsing feature I will need to use Document Routing to co-locate related documents in the same shard in SolrCloud. What are the advantages and disadvantages of Document Routing? Thanks, On Thu, Feb 11, 2016 at 12:54 PM, Joel Bernstein wrote: > Yeah that would be the reason. If you want distributed unique capabilities, > then you might want to start testing out 6.0. Aside from SELECT DISTINCT > queries, you also have a much more mature Streaming Expression library > which supports the unique operation. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Thu, Feb 11, 2016 at 12:28 PM, Brian Narsi wrote: > > > Ok I see that Collapsing features requires documents to be co-located in > > the same shard in SolrCloud. > > > > Could that be a reason for duplication? > > > > On Thu, Feb 11, 2016 at 11:09 AM, Joel Bernstein > > wrote: > > > > > The CollapsingQParserPlugin shouldn't have duplicates in the result > set. > > > Can you provide the details? > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > On Thu, Feb 11, 2016 at 12:02 PM, Brian Narsi > > wrote: > > > > > > > I have tried to use the Collapsing feature but it appears that it > > leaves > > > > duplicated records in the result set. > > > > > > > > Is that expected? Or any suggestions on working around it? > > > > > > > > Thanks > > > > > > > > On Thu, Feb 11, 2016 at 9:30 AM, Brian Narsi > > wrote: > > > > > > > > > I am using > > > > > > > > > > Solr 5.1.0 > > > > > > > > > > On Thu, Feb 11, 2016 at 9:19 AM, Binoy Dalal < > binoydalal93@gmail.com > > > > > > > > wrote: > > > > > > > > > >> What version of Solr are you using? > > > > >> Have you taken a look at the Collapsing Query Parser. It basically > > > > >> performs > > > > >> the same functions as grouping but is much more efficient at doing > > it. > > > > >> Take a look here: > > > > >> > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results > > > > >> > > > > >> On Thu, Feb 11, 2016 at 8:44 PM Brian Narsi > > > wrote: > > > > >> > > > > >> > I am trying to select distinct records from a collection. (I > need > > > > >> distinct > > > > >> > name and corresponding id) > > > > >> > > > > > >> > I have tried using grouping and group format of simple but that > > > takes > > > > a > > > > >> > long time to execute and sometimes runs into out of memory > > > exception. > > > > >> > Another limitation seems to be that total number of groups are > not > > > > >> > returned. > > > > >> > > > > > >> > Is there another faster and more efficient way to do this? > > > > >> > > > > > >> > Thank you > > > > >> > > > > > >> -- > > > > >> Regards, > > > > >> Binoy Dalal > > > > >> > > > > > > > > > > > > > > > > > > > > --001a11c3b44e0fae5d052b86d26d--