lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Bernstein (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
Date Mon, 15 Dec 2014 00:12:13 GMT

     [ https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-6581:
---------------------------------
    Description: 
*Background*

The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent are optimized
to work with a top level FieldCache. Top level FieldCaches have a very fast docID to top-level
ordinal lookup. Fast access to the top-level ordinals allows for very high performance field
collapsing on high cardinality fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level FieldCache is
no longer in regular use. Instead all top level caches are accessed through MultiDocValues.


There are some major advantages of using the MultiDocValues rather then a top level FieldCache.
But the lookup from docId to top-level ordinals is slower using MultiDocValues.

My testing has shown that *after optimizing* the CollapsingQParserPlugin code to use MultiDocValues,
the performance drop is around 100%.  For some use cases this performance drop is a blocker.

*What About Faceting?*

String faceting also relies on the top level ordinals. Is faceting performance effected also?
My testing has shown that the faceting performance is effected much less then collapsing.


One possible reason for this is that field collapsing is memory bound and faceting is not.
So the additional memory accesses needed for MultiDocValues effects field collapsing much
more the faceting.

*Proposed Solution*

The proposed solution is to have the default Collapse and Expand algorithm us MultiDocValues,
but to provide an option to use a top level FieldCache if the performance of MultiDocValues
is a blocker.

The proposed mechanism for switching to the FieldCache would be a new "hint" parameter. If
the hint parameter is set to "FAST_QUERY" then the top-level FieldCache would be used for
both Collapse and Expand.

Example syntax:

fq={!collapse field=x hint=FAST_QUERY}







 







 






  was:
There were changes made to the CollapsingQParserPlugin and ExpandComponent in the 5x branch
that were driven by changes to the Lucene Collectors API and DocValues API. This ticket is
to review the 5x implementation and make any changes necessary in preparation for a 5.0 release.




> Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
> -----------------------------------------------------------
>
>                 Key: SOLR-6581
>                 URL: https://issues.apache.org/jira/browse/SOLR-6581
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: SOLR-6581.patch, SOLR-6581.patch
>
>
> *Background*
> The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent are optimized
to work with a top level FieldCache. Top level FieldCaches have a very fast docID to top-level
ordinal lookup. Fast access to the top-level ordinals allows for very high performance field
collapsing on high cardinality fields. 
> LUCENE-5666 unified the DocValues and FieldCache api's so that the top level FieldCache
is no longer in regular use. Instead all top level caches are accessed through MultiDocValues.

> There are some major advantages of using the MultiDocValues rather then a top level FieldCache.
But the lookup from docId to top-level ordinals is slower using MultiDocValues.
> My testing has shown that *after optimizing* the CollapsingQParserPlugin code to use
MultiDocValues, the performance drop is around 100%.  For some use cases this performance
drop is a blocker.
> *What About Faceting?*
> String faceting also relies on the top level ordinals. Is faceting performance effected
also? My testing has shown that the faceting performance is effected much less then collapsing.

> One possible reason for this is that field collapsing is memory bound and faceting is
not. So the additional memory accesses needed for MultiDocValues effects field collapsing
much more the faceting.
> *Proposed Solution*
> The proposed solution is to have the default Collapse and Expand algorithm us MultiDocValues,
but to provide an option to use a top level FieldCache if the performance of MultiDocValues
is a blocker.
> The proposed mechanism for switching to the FieldCache would be a new "hint" parameter.
If the hint parameter is set to "FAST_QUERY" then the top-level FieldCache would be used for
both Collapse and Expand.
> Example syntax:
> fq={!collapse field=x hint=FAST_QUERY}
>  
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message