From dev-return-328520-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Thu Jul 19 19:31:06 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 125FF18067A for ; Thu, 19 Jul 2018 19:31:05 +0200 (CEST) Received: (qmail 80749 invoked by uid 500); 19 Jul 2018 17:31:04 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 80667 invoked by uid 99); 19 Jul 2018 17:31:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Jul 2018 17:31:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 052841A09DA for ; Thu, 19 Jul 2018 17:31:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id AMsLJxY9trQE for ; Thu, 19 Jul 2018 17:31:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 42CF85F4E3 for ; Thu, 19 Jul 2018 17:31:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 301D6E0E87 for ; Thu, 19 Jul 2018 17:31:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 742FC2714A for ; Thu, 19 Jul 2018 17:31:00 +0000 (UTC) Date: Thu, 19 Jul 2018 17:31:00 +0000 (UTC) From: "ASF subversion and git services (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SOLR-12343) JSON Field Facet refinement can return incorrect counts/stats for sorted buckets MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SOLR-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549602#comment-16549602 ] ASF subversion and git services commented on SOLR-12343: -------------------------------------------------------- Commit a7fe950074a834edc070c265df1394181b268683 in lucene-solr's branch refs/heads/branch_7x from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a7fe950 ] SOLR-12343: Fixed a bug in JSON Faceting that could cause incorrect counts/stats when using non default sort options This also adds a new configurable "overrefine" option (cherry picked from commit 3a5d4a25df310d2021fa947ea593cc9b3c93a386) > JSON Field Facet refinement can return incorrect counts/stats for sorted buckets > -------------------------------------------------------------------------------- > > Key: SOLR-12343 > URL: https://issues.apache.org/jira/browse/SOLR-12343 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Hoss Man > Assignee: Yonik Seeley > Priority: Major > Attachments: SOLR-12343.patch, SOLR-12343.patch, SOLR-12343.patch, SOLR-12343.patch, SOLR-12343.patch, SOLR-12343.patch, SOLR-12343.patch, SOLR-12343.patch, SOLR-12343.patch, __incomplete_processEmpty_microfix.patch > > > The way JSON Facet's simple refinement "re-sorts" buckets after refinement can cause _refined_ buckets to be "bumped out" of the topN based on the refined counts/stats depending on the sort - causing _unrefined_ buckets originally discounted in phase#2 to bubble up into the topN and be returned to clients *with inaccurate counts/stats* > The simplest way to demonstrate this bug (in some data sets) is with a {{sort: 'count asc'}} facet: > * assume shard1 returns termX & termY in phase#1 because they have very low shard1 counts > ** but *not* returned at all by shard2, because these terms both have very high shard2 counts. > * Assume termX has a slightly lower shard1 count then termY, such that: > ** termX "makes the cut" off for the limit=N topN buckets > ** termY does not make the cut, and is the "N+1" known bucket at the end of phase#1 > * termX then gets included in the phase#2 refinement request against shard2 > ** termX now has a much higher _known_ total count then termY > ** the coordinator now sorts termX "worse" in the sorted list of buckets then termY > ** which causes termY to bubble up into the topN > * termY is ultimately included in the final result _with incomplete count/stat/sub-facet data_ instead of termX > ** this is all indepenent of the possibility that termY may actually have a significantly higher total count then termX across the entire collection > ** the key problem is that all/most of the other terms returned to the client have counts/stats that are the cumulation of all shards, but termY only has the contributions from shard1 > Important Notes: > * This scenerio can happen regardless of the amount of overrequest used. Additional overrequest just increases the number of "extra" terms needed in the index with "better" sort values then termX & termY in shard2 > * {{sort: 'count asc'}} is not just an exceptional/pathelogical case: > ** any function sort where additional data provided shards during refinement can cause a bucket to "sort worse" can also cause this problem. > ** Examples: {{sum(price_i) asc}} , {{min(price_i) desc}} , {{avg(price_i) asc|desc}} , etc... -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org