lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject RE: SOLR-6143 Bad facet counts from CollapsingQParserPlugin
Date Fri, 06 Jun 2014 17:07:04 GMT
Reposting this from jira ticket to users list:

I'm noticing a very weird bug using the CollapsingQParserPlugin. We tried
to use this plugin when we realized that faceting on the groups would take
a ridiculous amount of time. To its credit, it works very quickly, however
the facet counts that it gives are incorrect.

We have a smallish index of about 200k documents with about with about 50k
distinct groups within it.

When we use the group implementation
(&group=true&group.field=PrSKU&group.facet=true) which I believe this
attempts to emulate, the facet counts are totally correct.

When we use the field collapsing implementation, it will show an incorrect
count for the non-filtered query, but when we go to the filtered query, the
facet count corrects itself and matches the document count.

Here are some SOLR responses:

solrslave01:8983/index/select?q=classIDs:12&fl=PrSKU&fq=
{!collapse%20field=PrSKU}&facet=true&facet.field=at_12_wood_tone

The facet field will return

<int name="Dark Wood">867</int>
<int name="Medium Wood">441</int>
<int name="Light Wood">253</int>

When I actually apply a filter query like so:

solrslave01:8983/index/select?q=classIDs:12&fl=PrSKU&fq={!collapse%20field=PrSKU}

&facet=true&facet.field=at_12_wood_tone&fq=at_12_wood_tone:%22Light%20Wood%22

I actually pull back 270 results and the facet updates itself with the
correct number at the bottom

<int name="Light Wood">270</int>
<int name="Dark Wood">68</int>
<int name="Medium Wood">66</int>

If this were the same number pre and post filter query I would assume that
it was simply my data that was bad, however I've pored over this for the
better part of a day and I'm pretty sure it's the plugin. For reference,
this field that I'm faceting on is a multiValued field, however I have
noticed the exact same behavior on non multiValued fields (such as price).

I can provide any other details you might need

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message