Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0BA5910E23 for ; Wed, 20 Nov 2013 04:29:32 +0000 (UTC) Received: (qmail 6831 invoked by uid 500); 20 Nov 2013 04:29:24 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 6768 invoked by uid 500); 20 Nov 2013 04:29:23 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 6760 invoked by uid 99); 20 Nov 2013 04:29:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Nov 2013 04:29:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of otis.gospodnetic@gmail.com designates 209.85.216.42 as permitted sender) Received: from [209.85.216.42] (HELO mail-qa0-f42.google.com) (209.85.216.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Nov 2013 04:29:16 +0000 Received: by mail-qa0-f42.google.com with SMTP id k4so1464470qaq.8 for ; Tue, 19 Nov 2013 20:28:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=quiVZwmazHjJ0nbKiXCf1Y8ce3A1wnob5T7d5FjvJRw=; b=fNcMTYP6ENjZGtEvmUsiNIvO1V2iHG2i1NGPz6qwY4Bvcl3u76i+XvkLNgKlhQULX6 3MTrYM1RdTT5GLKWGeErdpD/qcoy3sG/8jQpN7Ue8QbN6bMLAYJSAzIdy1Tjzo9ZOA+C vjc/pWZk/USsw6OTURb2KOmcRsPNbGE19UZF1Qkf2KfIYMj8V7OgWwvmk1A6ZAsSJ9nE S1Gx+X026bhax5IzH4GsrfgF/Msn3OP8xy4aM1bqVwxVHRS6K2/OPCF5Lwhk22oNMU3r d/P4zqDdfWRkVVy5kUDcrcSVlmfy/ycgkuR7Zz9nxp6n1ZFTipLfSqHGhktyncaobVy5 aypg== MIME-Version: 1.0 X-Received: by 10.224.126.193 with SMTP id d1mr36942727qas.41.1384921735569; Tue, 19 Nov 2013 20:28:55 -0800 (PST) Received: by 10.224.103.72 with HTTP; Tue, 19 Nov 2013 20:28:55 -0800 (PST) In-Reply-To: References: Date: Tue, 19 Nov 2013 23:28:55 -0500 Message-ID: Subject: Re: field collapsing performance in sharded environment From: Otis Gospodnetic To: "solr-user@lucene.apache.org" Content-Type: multipart/alternative; boundary=001a11c2c16217267404eb943864 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2c16217267404eb943864 Content-Type: text/plain; charset=ISO-8859-1 Have a look at https://issues.apache.org/jira/browse/SOLR-5027 + https://wiki.apache.org/solr/CollapsingQParserPlugin Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Wed, Nov 13, 2013 at 2:46 PM, David Anthony Troiano < dtroiano@basistech.com> wrote: > Hello, > > I'm hitting a performance issue when using field collapsing in a > distributed Solr setup and I'm wondering if others have seen it and if > anyone has an idea to work around. it. > > I'm using field collapsing to deduplicate documents that have the same near > duplicate hash value, and deduplicating at query time (as opposed to > filtering at index time) is a requirement. I have a sharded setup with 10 > cores (not SolrCloud), each having ~1000 documents each. Of the 10k docs, > most have a unique near duplicate hash value, so there are about 10k unique > values for the field that I'm grouping on. The grouping parameters that > I'm using are: > > group=true > group.field= > group.main=true > > I'm attempting distributed queries (&shards=s1,s2,...,s10) where the only > difference is the absence or presence of these three grouping parameters > and I'm consistently seeing a marked difference in performance (as a > representative data point, 200ms latency without grouping and 1600ms with > grouping). Interestingly, if I put all 10k docs on the same core and query > that core independently with and without grouping, I don't see much of a > latency difference, so the performance degradation seems to exist only in > the sharded setup. > > Is there a known performance issue when field collapsing in a sharded setup > (perhaps only manifests when the grouping field has many unique values), or > have other people observed this? Any ideas for a workaround? Note that > docs in my sharded setup can only have the same signature if they're in the > same shard, so perhaps that can be used to boost perf, though I don't see > an exposed way to do so. > > A follow-on question is whether we're likely to see the same issue if / > when we move to SolrCloud. > > Thanks, > Dave > --001a11c2c16217267404eb943864--