lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: High facet.limit (with only 2-3 actual facets) -> Massive bandwidth consumption in DistributedSearch
Date Thu, 08 Sep 2011 20:34:08 GMT
On Thu, Sep 8, 2011 at 4:18 PM, Frederik Kraus <frederik.kraus@gmail.com> wrote:
>  Now that is quite interesting indeed and sounds like a bug to me. Including facets
with a count of 0 we have a few 100k which then apparently get transferred. hmhmhm
>
> Can anyone with more knowledge of the facet component maybe chime in why the miscount
is removed?

It's a trade-off, for sure.  Here's what the code says:

          if (dff.sort.equals(FacetParams.FACET_SORT_COUNT)) {
            if (dff.limit > 0) {
              // set the initial limit higher to increase accuracy
              dff.initialLimit = (int)(dff.initialLimit * 1.5) + 10;
              dff.initialMincount = 0;      // TODO: we could change
this to 1, but would then need more refinement for small facet result
sets?
            } else {
              // if limit==-1, then no need to artificially lower
mincount to 0 if it's 1
              dff.initialMincount = Math.min(dff.minCount, 1);
            }

So this is bad if you have a high facet.limit, but really few actual matches.
It may be better for large base docsets that match a lot of facet
values (but in that case, one would expect to see few zeros anyway).
So perhaps using 0 as the mincount isn't the right tradeoff?

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference

Mime
View raw message