lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-11733) json.facet refinement fails to bubble up some long tail (overrequested) terms?
Date Tue, 12 Dec 2017 19:04:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288083#comment-16288083
] 

ASF subversion and git services commented on SOLR-11733:
--------------------------------------------------------

Commit 53f2d4aa3aa171d5f37284eba9ca56d987729796 in lucene-solr's branch refs/heads/branch_7x
from Chris Hostetter
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=53f2d4a ]

Beef up testing of json.facet 'refine:simple' when dealing with 'Long Tail' terms

In an attempt to get more familiar with json.facet refinement, I set out to try and refactor/generalize/clone
some of the existing facet.pivot refinement tests to assert that json.facet could produce
the same results.
This test is a baby step towards doing that: Cloning DistributedFacetPivotLongTailTest into
DistributedFacetSimpleRefinementLongTailTest (with shared index building code).

Along the way, I learned that the core logic of 'refine:simple' is actually quite different
then how facet.field
& facet.pivot work (see discussion in SOLR-11733), so they do *NOT* produce the same results
in many "Long Tail"
Sitautions.  As a result, many of the logic/assertions inDistributedFacetSimpleRefinementLongTailTest
are very
differnet then their counter parts in DistributedFacetPivotLongTailTest, with detailed explanations
in comments.

Hopefully this test will prove useful down the road to anyone who might want to compare/contrast
facet.pivot
with json.facet, and to prevent regressions in 'refine:simple' if/when we add more complex
refinement
approaches in the future.

There are also a few TODOs in the test related to some other small discrepencies between json.facet
and
stats.field that I opened along the way, indicating where the tests should be modified once
those issues are
addressed in json.facet...

 - SOLR-11706: support for multivalued numeric fields in stats
 - SOLR-11695: support for 'missing()' & 'num_vals()' (aka: 'count' from stats.field)
numeric stats
 - SOLR-11725: switch from 'uncorrected stddev' to 'corrected stddev'

(cherry picked from commit 2990c88a927213177483b61fe8e6971df04fc3ed)


> json.facet refinement fails to bubble up some long tail (overrequested) terms?
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-11733
>                 URL: https://issues.apache.org/jira/browse/SOLR-11733
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Facet Module
>            Reporter: Hoss Man
>
> Something wonky is happening with {{json.facet}} refinement.
> "Long Tail" terms that may not be in the "top n" on every shard, but are in the "top
n + overrequest" for at least 1 shard aren't getting refined and included in the aggragated
response in some cases.
> I don't understand the code enough to explain this, but I have some steps to reproduce
that i'll post in a comment shortly



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message