lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Bernstein (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
Date Sat, 03 Jan 2015 14:56:34 GMT

     [ https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-6581:
---------------------------------
    Attachment: SOLR-6581.patch

Getting much closer. The numeric collapse field tests are now passing and variables have been
renamed for clarity.

> Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
> -----------------------------------------------------------
>
>                 Key: SOLR-6581
>                 URL: https://issues.apache.org/jira/browse/SOLR-6581
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch,
SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, renames.diff
>
>
> *Background*
> The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent are optimized
to work with a top level FieldCache. Top level FieldCaches have a very fast docID to top-level
ordinal lookup. Fast access to the top-level ordinals allows for very high performance field
collapsing on high cardinality fields. 
> LUCENE-5666 unified the DocValues and FieldCache api's so that the top level FieldCache
is no longer in regular use. Instead all top level caches are accessed through MultiDocValues.

> There are some major advantages of using the MultiDocValues rather then a top level FieldCache.
But there is one disadvantage, the lookup from docId to top-level ordinals is slower using
MultiDocValues.
> My testing has shown that *after optimizing* the CollapsingQParserPlugin code to use
MultiDocValues, the performance drop is around 100%.  For some use cases this performance
drop is a blocker.
> *What About Faceting?*
> String faceting also relies on the top level ordinals. Is faceting performance affected
also? My testing has shown that the faceting performance is affected much less then collapsing.

> One possible reason for this may be that field collapsing is memory bound and faceting
is not. So the additional memory accesses needed for MultiDocValues affects field collapsing
much more then faceting.
> *Proposed Solution*
> The proposed solution is to have the default Collapse and Expand algorithm use MultiDocValues,
but to provide an option to use a top level FieldCache if the performance of MultiDocValues
is a blocker.
> The proposed mechanism for switching to the FieldCache would be a new "hint" parameter.
If the hint parameter is set to "FAST_QUERY" then the top-level FieldCache would be used for
both Collapse and Expand.
> Example syntax:
> {code}
> fq={!collapse field=x hint=FAST_QUERY}
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message