lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-5725) Efficient facets without counts for enum method
Date Fri, 02 Sep 2016 14:22:20 GMT

    [ https://issues.apache.org/jira/browse/SOLR-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15458666#comment-15458666
] 

David Smiley commented on SOLR-5725:
------------------------------------

+1 -- Nice.  I'm relieved to see no mention of "probing" anymore :-)

One small bit of improvement IMO is the short description describing this setting.  Right
now on the constant you have:

bq. Boolean parameters to indicate that Solr should check terms docs for intersection with
result docset without calculating exact facet counts

IMO that's too much implementation detail and isn't immediately obvious what effect it has.
 I suggest:  A boolean parameter that caps the facet counts at 1. With this set, a returned
count will only be 0 or 1. For apps that don't need the count, this should be an optimization.

> Efficient facets without counts for enum method
> -----------------------------------------------
>
>                 Key: SOLR-5725
>                 URL: https://issues.apache.org/jira/browse/SOLR-5725
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Alexey Kozhemiakin
>            Assignee: Mikhail Khludnev
>             Fix For: master (7.0), 6.3
>
>         Attachments: SOLR-5725-5x.patch, SOLR-5725-master.patch, SOLR-5725.patch, SOLR-5725.patch,
SOLR-5725.patch, SOLR-5725.patch, SOLR-5725.patch, SOLR-5725.patch, SOLR-5725.patch
>
>
> Shot version:
> This improves performance for facet.method=enum when it's enough to know that facet count>0,
for example when you it's when you dynamically populate filters on search form. New method
checks if two bitsets intersect instead of counting intersection size.
> Long version:
> We have a dataset containing hundreds of millions of records, we facet by dozens of fields
with many of facet-excludes and have relatively small number of unique values in fields, around
thousands.
> Before executing search, users work with "advanced search" form, our  goal is to populate
dozens of filters with values which are applicable with other selected values, so basically
this is a use case for facets with mincount=1, but without need in actual counts.
> Our performance tests showed that facet.method=enum works much better than fc\fcs, probably
due to a specific ratio of "docset"\"unique terms count". For example average execution of
query time with method fc=1500ms, fcs=2600ms and with enum=280ms. Profiling indicated the
majority time for enum was spent on intersecting docsets.
> Hers's a patch that introduces an extension to facet calculation for method=enum. Basically
it uses docSetA.intersects(docSetB) instead of docSetA. intersectionSize (docSetB).
> As a result we were able to reduce our average query time from 280ms to 60ms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message