lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kenny Knecht <ke...@ontoforce.com>
Subject Re: Nested facet complete wrong counts
Date Sat, 11 Nov 2017 14:00:23 GMT
Thank you. But as I showed in my example we used refine and overrequest is
not strictly needed because we need all buckets anyway. But that can hardly
explain an error of 60%, right?

Op 10-nov.-2017 19:29 schreef "Amrit Sarkar" <sarkaramrit2@gmail.com>:

> Kenny,
>
> This is a known behavior in multi-sharded collection where the field values
> belonging to same facet doesn't reside in same shard. Yonik Seeley has
> improved the Json Facet feature by introducing "overrequest" and "refine"
> parameters.
>
> Kindly checkout Jira:
> https://issues.apache.org/jira/browse/SOLR-7452
> https://issues.apache.org/jira/browse/SOLR-9432
>
> Relevant blog: https://medium.com/@abb67cbb46b/1acfa77cd90c
>
> On 10 Nov 2017 10:02 p.m., "kenny" <kenny@ontoforce.com> wrote:
>
> > Hi all,
> >
> > We are doing some tests in solr 6.6 with json facet api and we get
> > completely wrong counts for some combination of  facets
> >
> > Setting: We have a set of fields for 376k documents in our query (total
> > 120M documents). We work with 2 shards. When doing first a faceting over
> > the first facet and keeping these numbers, we subsequently do a nested
> > faceting over both facets.
> >
> > Then we add the numbers of sub-facet and expect to get the
> (approximately)
> > the same numbers back. Sometimes we get rounding errors of about 1%
> > difference. But on other occasions it seems to way off
> >
> > for example
> >
> > Gender (3 values) Country (211 values)
> > 16226 - 18424 = -2198 (-13.5461604832%)
> > 282854 - 464387 = -181533 (-64.1790464338%)
> > 40489 - 47902 = -7413 (-18.3086764306%)
> > 36672 - 49749 = -13077 (-35.6593586387%)
> >
> > Gender (3 values)  Status (17 Values)
> > 16226 - 16273 = -47 (-0.289658572661%)
> > 282854 - 435974 = -153120 (-54.1339348215%)
> > 40489 - 49925 = -9436 (-23.305095211%)
> > 36672 - 54019 = -17347 (-47.3031195462%)
> >
> > ...
> >
> > These are the typical requests we submit. So note that we have refine and
> > an overrequest, but we in the case of Gender vs Request we should query
> all
> > the buckets anyway.
> >
> > {"wt":"json","rows":0,"json.facet":"{\"Status_sfhll\":\"hll(
> > Status_sf)\",\"Status_sf\":{\"type\":\"terms\",\"field\":\"S
> > tatus_sf\",\"missing\":true,\"refine\":true,\"overrequest\":
> > 50,\"limit\":50,\"offset\":0}}","q":"*:*","fq":["type:\"something\""]}
> >
> > {"wt":"json","rows":0,"json.facet":"{\"Gender_sf\":{\"type\"
> > :\"terms\",\"field\":\"Gender_sf\",\"missing\":true,\"refine
> > \":true,\"overrequest\":10,\"limit\":10,\"offset\":0,\"
> > facet\":{\"Status_sf\":{\"type\":\"terms\",\"field\":\"Statu
> > s_sf\",\"missing\":true,\"refine\":true,\"overrequest\":50,\
> > "limit\":50,\"offset\":0}}},\"Gender_sfhll\":\"hll(Gender_
> > sf)\"}","q":"*:*","fq":["type:\"something\""]}
> >
> > Is this a known bug? Would switching to old facet api resolve this? Are
> > there other parameters we miss?
> >
> >
> > Thanks
> >
> >
> > kenny
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message