Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 65CB7200BB1 for ; Thu, 20 Oct 2016 01:16:35 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 6473D160AFB; Wed, 19 Oct 2016 23:16:35 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8625C160AEA for ; Thu, 20 Oct 2016 01:16:34 +0200 (CEST) Received: (qmail 26171 invoked by uid 500); 19 Oct 2016 23:16:32 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 26158 invoked by uid 99); 19 Oct 2016 23:16:31 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Oct 2016 23:16:31 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 77D5EC05BF for ; Wed, 19 Oct 2016 23:16:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.999 X-Spam-Level: * X-Spam-Status: No, score=1.999 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=johnbickerstaff-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id X6rPBQAqPg0i for ; Wed, 19 Oct 2016 23:16:28 +0000 (UTC) Received: from mail-vk0-f43.google.com (mail-vk0-f43.google.com [209.85.213.43]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 388A15FBCD for ; Wed, 19 Oct 2016 23:16:28 +0000 (UTC) Received: by mail-vk0-f43.google.com with SMTP id q126so48425892vkd.2 for ; Wed, 19 Oct 2016 16:16:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=johnbickerstaff-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=HWD3h61POU5bOV/FLPzXjzKXE11CvBflO8oPTQoTO2I=; b=Znf6wAkDbF+fbofK52O3M1A6jwnlktlMyXqDJDmiMCa1y1hr3apCFjPnwQTJGcBw8g Tyorlrifyt+gbtTOfuLD2h2ewnyiTPOjzZ1IAu3LWcA6h/RdVRnauHvuQaOLTq7UjD1Q AvNgcGaA4ErpqyngCkxA7ax1E3FdAsH7N+ezqcGgD0NWuIbKaVxj5NXCgQn3abXeooix dbgvN0ECiA3ciVCLl5ZIztizzZi3q74KbnGLfghGCzfl9rEtxKwf4V+U3SPMcWVaIhC4 bAzlJFuVzYSns4w0EqAJMOeMgNgW+wDOpKBMB6baS7LonKWPZsPBi5IdSSmD+AKscsxt pxZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=HWD3h61POU5bOV/FLPzXjzKXE11CvBflO8oPTQoTO2I=; b=GufwysDyBvoyjKa2k6FYb6gdFmy7jji7CH9o94ihgu7LcalU5cmJTmTdMLIVun4Lrq AdFM49AoHlqfFqEnjKCwR9JDhv/NDt5iIFrIPnLjAG78waHqp2UfjLEsWPwN8/CzNii7 0jRMIAu+cJHtrC9+6cVxwbM6E1CrW61my5slTc4adf9WZ9kzfRiDkUdeKytUFE0Tc7sZ +/Hqr75MOe5PXEvSTTFgd7+ZJTI1pNcC8QZ2z7fknsW+4g9OOH+onBoFPCn5o3REDnrX 35cpPZPWhmFX70b03oc5IVKfydPo2fwBfEwe+01LxEhcuD5apoWSrSHRZcsCGoJRTREo VCYQ== X-Gm-Message-State: AA6/9RkZUMZUZNdZITrjffnLhNskRibYOJJWEAMdHiQ38QKbq3IUYYyK5eafMXNlX/neDtquDS3AHG9tY4OS4A== X-Received: by 10.31.92.21 with SMTP id q21mr6879043vkb.104.1476918672573; Wed, 19 Oct 2016 16:11:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.35.135 with HTTP; Wed, 19 Oct 2016 16:11:11 -0700 (PDT) Received: by 10.103.35.135 with HTTP; Wed, 19 Oct 2016 16:11:11 -0700 (PDT) In-Reply-To: References: From: John Bickerstaff Date: Wed, 19 Oct 2016 17:11:11 -0600 Message-ID: Subject: Re: Result Grouping vs. Collapsing Query Parser -- Can one be deprecated? To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001a114e2592d75f30053f3fec1f archived-at: Wed, 19 Oct 2016 23:16:35 -0000 --001a114e2592d75f30053f3fec1f Content-Type: text/plain; charset=UTF-8 Thank you for posting that. I'll be saving it in my "important painful lessons learned by others" mail folder. On Oct 19, 2016 4:51 PM, "Mike Lissner" wrote: > Hi all, > > I've had a rotten day today because of Solr. I want to share my experience > and perhaps see if we can do something to fix this particular situation in > the future. > > Solr currently has two ways to get grouped results (so far!). You can > either use Result Grouping or you can use the Collapsing Query Parser. > Result grouping seems like the obvious way to go. It's well documented, the > parameters are clear, it doesn't use a bunch of weird syntax (ie, > {!collapse blah=foo}), and it uses the feature name from SQL (so it comes > up in Google). > > OTOH, if you use faceting with result grouping, which I imagine many people > do, you get terrible performance. In our case it went from subsecond to > 10-120 seconds for big queries. Insanely bad. > > Collapsing Query Parser looks like a good way forward for us, and we'll be > investigating that, but it uses the Expand component that our library > doesn't support, to say nothing of the truly bizarre syntax. So this will > be a fair amount of effort to switch. > > I'm curious if there is anything we can do to clean up this situation. What > I'd really like to do is: > > 1. Put a HUGE warning on the Result Grouping docs directing people away > from the feature if they plan to use faceting (or perhaps directing them > away no matter what?) > > 2. Work towards eliminating one or the other of these features. They're > nearly completely compatible, except for their syntax and performance. The > collapsing query parser apparently was only written because the result > grouping had such bad performance -- In other words, it doesn't exist to > provide unique features, it exists to be faster than the old way. Maybe we > can get rid of one or the other of these, taking the best parts from each > (syntax from Result Grouping, and performance from Collapse Query Parser)? > > Thanks, > > Mike > > PS -- For some extra context, I want to share some other reasons this is > frustrating: > > 1. I just spent a week upgrading a third-party library so it would support > grouped results, and another week implementing the feature in our code with > tests and everything. That was a waste. > 2. It's hard to notice performance issues until after you deploy to a big > data environment. This creates a bad situation for users until you detect > it and revert the new features. > 3. The documentation *could* say something about the fact that a new > feature was developed to provide better performance for grouping. It could > say that using facets with groups is an anti-feature. It says neither. > > I only mention these because, like others, I've had a real rough time with > solr (again), and these are the kinds of seemingly small things that could > have made all the difference. > --001a114e2592d75f30053f3fec1f--