Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6711BF750 for ; Mon, 15 Dec 2014 00:12:14 +0000 (UTC) Received: (qmail 22919 invoked by uid 500); 15 Dec 2014 00:12:13 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 22853 invoked by uid 500); 15 Dec 2014 00:12:13 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 22843 invoked by uid 99); 15 Dec 2014 00:12:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Dec 2014 00:12:13 +0000 Date: Mon, 15 Dec 2014 00:12:13 +0000 (UTC) From: "Joel Bernstein (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-6581: --------------------------------- Description: *Background* The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent are optimized to work with a top level FieldCache. Top level FieldCaches have a very fast docID to top-level ordinal lookup. Fast access to the top-level ordinals allows for very high performance field collapsing on high cardinality fields. LUCENE-5666 unified the DocValues and FieldCache api's so that the top level FieldCache is no longer in regular use. Instead all top level caches are accessed through MultiDocValues. There are some major advantages of using the MultiDocValues rather then a top level FieldCache. But the lookup from docId to top-level ordinals is slower using MultiDocValues. My testing has shown that *after optimizing* the CollapsingQParserPlugin code to use MultiDocValues, the performance drop is around 100%. For some use cases this performance drop is a blocker. *What About Faceting?* String faceting also relies on the top level ordinals. Is faceting performance effected also? My testing has shown that the faceting performance is effected much less then collapsing. One possible reason for this is that field collapsing is memory bound and faceting is not. So the additional memory accesses needed for MultiDocValues effects field collapsing much more the faceting. *Proposed Solution* The proposed solution is to have the default Collapse and Expand algorithm us MultiDocValues, but to provide an option to use a top level FieldCache if the performance of MultiDocValues is a blocker. The proposed mechanism for switching to the FieldCache would be a new "hint" parameter. If the hint parameter is set to "FAST_QUERY" then the top-level FieldCache would be used for both Collapse and Expand. Example syntax: fq={!collapse field=x hint=FAST_QUERY} was: There were changes made to the CollapsingQParserPlugin and ExpandComponent in the 5x branch that were driven by changes to the Lucene Collectors API and DocValues API. This ticket is to review the 5x implementation and make any changes necessary in preparation for a 5.0 release. > Prepare CollapsingQParserPlugin and ExpandComponent for 5.0 > ----------------------------------------------------------- > > Key: SOLR-6581 > URL: https://issues.apache.org/jira/browse/SOLR-6581 > Project: Solr > Issue Type: Bug > Reporter: Joel Bernstein > Assignee: Joel Bernstein > Priority: Minor > Fix For: 5.0 > > Attachments: SOLR-6581.patch, SOLR-6581.patch > > > *Background* > The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent are optimized to work with a top level FieldCache. Top level FieldCaches have a very fast docID to top-level ordinal lookup. Fast access to the top-level ordinals allows for very high performance field collapsing on high cardinality fields. > LUCENE-5666 unified the DocValues and FieldCache api's so that the top level FieldCache is no longer in regular use. Instead all top level caches are accessed through MultiDocValues. > There are some major advantages of using the MultiDocValues rather then a top level FieldCache. But the lookup from docId to top-level ordinals is slower using MultiDocValues. > My testing has shown that *after optimizing* the CollapsingQParserPlugin code to use MultiDocValues, the performance drop is around 100%. For some use cases this performance drop is a blocker. > *What About Faceting?* > String faceting also relies on the top level ordinals. Is faceting performance effected also? My testing has shown that the faceting performance is effected much less then collapsing. > One possible reason for this is that field collapsing is memory bound and faceting is not. So the additional memory accesses needed for MultiDocValues effects field collapsing much more the faceting. > *Proposed Solution* > The proposed solution is to have the default Collapse and Expand algorithm us MultiDocValues, but to provide an option to use a top level FieldCache if the performance of MultiDocValues is a blocker. > The proposed mechanism for switching to the FieldCache would be a new "hint" parameter. If the hint parameter is set to "FAST_QUERY" then the top-level FieldCache would be used for both Collapse and Expand. > Example syntax: > fq={!collapse field=x hint=FAST_QUERY} > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org