Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CAED117BFF for ; Wed, 1 Oct 2014 19:28:56 +0000 (UTC) Received: (qmail 79946 invoked by uid 500); 1 Oct 2014 19:28:51 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 79875 invoked by uid 500); 1 Oct 2014 19:28:51 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 79845 invoked by uid 99); 1 Oct 2014 19:28:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2014 19:28:50 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mkhludnev@griddynamics.com designates 209.85.216.43 as permitted sender) Received: from [209.85.216.43] (HELO mail-qa0-f43.google.com) (209.85.216.43) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2014 19:28:25 +0000 Received: by mail-qa0-f43.google.com with SMTP id s7so811154qap.16 for ; Wed, 01 Oct 2014 12:28:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=griddynamics.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=bJ6HUAykkJHDp303Hg8zfP6eI45IwpQAWTyC//XUQbc=; b=VTZYJtom77eM2hyZaTLdlloonVxfomEPsrcGfKEtNeo/rQLdB2QR5ASZlUdjeeYeBR 5Y84xX4l+wPzocHPeWaPWJXhDr7k+Bb2KfOcng6QIfOVGYXJ5PYgNxfw4ED0LzdlJwf/ SKxCGfrF2HJpjgXVG8Y9B0T1E5WWvbWzwzWtw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=bJ6HUAykkJHDp303Hg8zfP6eI45IwpQAWTyC//XUQbc=; b=IQEZToc4ckCqNiyV+tDjITQUweEnxNxqLXhc3TJgAiwNBqIVe0vXeeoLTQSp+m8xcv CoupEJ8HXdiYe2G+C3DozHtOrrTAMi3RsB3b8s5cWzZabRk/txu7XJl7MrOlgCQIlztV KXQxiZkzTBMyhSYagdI+FbWxhHvfI+PjMM+ZYIU3waA01A/8V6SqQORhuYTfsOUq1JjN lKJuNuw5mMgxp0lyGOXBQA55YY9wymO8fzJ2UJFUGGJZQ85MOkhIuDo1dLw960OXnuaF IjWEmJi7HeSk94jjA1oGiW1FwWOeh5bioCB0Ok480xpFccS4HJBhdqmkQ6Aq8RyiOwgu JVYw== X-Gm-Message-State: ALoCoQmdw/oOMQSRg2DGoGEUOHxzzV+suPpAmiS/a/5l3pJT87N8iF/rfO38JkC0xRwcTwEaCgzY X-Received: by 10.236.62.198 with SMTP id y46mr28365739yhc.108.1412191702563; Wed, 01 Oct 2014 12:28:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.170.81.84 with HTTP; Wed, 1 Oct 2014 12:27:41 -0700 (PDT) In-Reply-To: References: <46AD7E4C-23A5-4261-A1EE-5AD52A2AA663@transpac.com> <52613889.6040200@kelkoo.com> <96B287E9-E4BD-45DB-848D-BC634BB6880B@flax.co.uk> From: Mikhail Khludnev Date: Wed, 1 Oct 2014 23:27:41 +0400 Message-ID: Subject: Re: Filter cache pollution during sharded edismax queries To: solr-user Content-Type: multipart/alternative; boundary=089e0122f1f6c958390504618055 X-Virus-Checked: Checked by ClamAV on apache.org --089e0122f1f6c958390504618055 Content-Type: text/plain; charset=UTF-8 Hoss, Nice to hear you! I wonder if there is a sequence chart, or maybe a deck, which explains the whole picture of distributed search, especially these ones? If it hasn't been presented to community so far, I'm aware of one conference which can accept such talk. WDYT? On Wed, Oct 1, 2014 at 9:17 PM, Chris Hostetter wrote: > > : +1 for using a different cache, but that's being quite unfamiliar with > the > : code. > > in (a) common case, people tend to "drill down" and filter on facet > constraints -- so using a special purpose cache for the refinements would > result in redundent caching of the same info in multiple places. > > : > > What's the point to refine these counts? I've thought that it make > sense > : > > only for facet.limit ed requests. Is it correct statement? can those > who > > refinement only happens if facet.limit is used and there are eligable > "top" constraints that were not returned by some shards. > > : > > suffer from the low performance, just unlimit facet.limit to avoid > that > : > > distributed hop? > > As noted, setting facet.limit=-1 might help for low cardinality fields to > ensure that every shard returns a count for every value and no-refinement > is needed, but that doesn't really help you for fields with > unknown/unbounded cardinality. > > As part of the distributed pivot faceting work, the amount of > "overrequest" done in phase 1 (for both facet.pivot & facet.field) was > made configurable via 2 new parameters... > > > https://lucene.apache.org/solr/4_10_0/solr-solrj/org/apache/solr/common/params/FacetParams.html#FACET_OVERREQUEST_RATIO > > https://lucene.apache.org/solr/4_10_0/solr-solrj/org/apache/solr/common/params/FacetParams.html#FACET_OVERREQUEST_COUNT > > ...so depending on the distribution of your data, you might find that by > adjusting those values to increase the amount of overrequesting done, you > can decrease the amount of refinement needed -- but there are obviously > tradeoffs. > > > > -Hoss > http://www.lucidworks.com/ > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics --089e0122f1f6c958390504618055--