lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-2894) Implement distributed pivot faceting
Date Tue, 20 May 2014 17:26:43 GMT

     [ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man updated SOLR-2894:
---------------------------

    Attachment: SOLR-2894.patch

I haven't had a lot of time to review the updatd patch in depth, but I did spend some time
trying to improve TestCloudPivotFacet to resolve some of the nocommits -- but i'm still seeing
failures...

1) I realized the "depth" check i was trying to do was bogus and commented it out (still need
to purge the code - didn't want to muck with that until the rest of the test was passing more
reliably)


2) the NPE I mentioned in QueryResponse.readPivots is still happening, but i realized that
it has nothing to do with the datatype of the fields being pivoted on -- it only seemed that
way because of the poor randomization of values getting put in the single valued string fields
vs the multivalued fields in the old version of the test.

The bug seems to pop up in _some_ cases where a pivot constraint has no sub-pivots.  Normally
this results in a NamedList with 3 keys (field,value,count) -- the 4th "pivot" key is only
included if there is a list of at least 1 sub-pivot.  But in some cases (I can't explain from
looking at the code why) the server is responding back with a 4th entry using hte key "pivot"
but the value is "null"

We need to get to the bottom of this -- it's not clear if there is a bug preventing real sub-pivot
constraints from being returned correctly, or if this is just a mistake in the code where
it's putting "null" in the NamedList instead of not adding anything at all (in which case
it might be tempting to make QueryResponse.readPivots smart enough to deal with it, but if
we did that it would still be broken for older clients -- best to stick with teh current API
semantics)


In the attached patch update, this seed will fail showing the null sub-pivots problem...

{noformat}

   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch
-Dtests.seed=680E68425E7CA1BA -Dtests.slow=true -Dtests.locale=es_US -Dtests.timezone=Canada/Eastern
-Dtests.file.encoding=UTF-8
   [junit4] FAILURE 41.7s | TestCloudPivotFacet.testDistribSearch <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: Server sent back 'null' for sub
pivots?
   [junit4]    > 	at __randomizedtesting.SeedInfo.seed([680E68425E7CA1BA:E9E8E65A2923C186]:0)
   [junit4]    > 	at org.apache.solr.client.solrj.response.QueryResponse.readPivots(QueryResponse.java:383)
   [junit4]    > 	at org.apache.solr.client.solrj.response.QueryResponse.extractFacetInfo(QueryResponse.java:363)
   [junit4]    > 	at org.apache.solr.client.solrj.response.QueryResponse.setResponse(QueryResponse.java:148)
   [junit4]    > 	at org.apache.solr.client.solrj.response.QueryResponse.<init>(QueryResponse.java:91)
   [junit4]    > 	at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
   [junit4]    > 	at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
   [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:161)
   [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:145)

{noformat}



3) Independent (i think) from the NPE issue, there is still something wonky with the refined
counts when mincount is specified...

Here for example is a seed that gets based the QueryResponse.readPivots, but then fails the
numFound validation queries used to check the pivot counts...

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch
-Dtests.seed=F08A107C384690FC -Dtests.slow=true -Dtests.locale=ar_LY -Dtests.timezone=Jamaica
-Dtests.file.encoding=UTF-8
   [junit4] FAILURE 27.0s | TestCloudPivotFacet.testDistribSearch <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: {main({main(facet.pivot.mincount=9),extra({main(facet.limit=12),extra({main(facet.pivot=pivot_y_s%2Cpivot_x_s1),extra(facet=true&facet.pivot=pivot_x_s1%2Cpivot_x_s)})})}),extra(rows=0&q=id%3A%5B*+TO+503%5D)}
==> pivot_y_s,pivot_x_s1: {params(rows=0),defaults({main(rows=0&q=id%3A%5B*+TO+503%5D),extra(fq=%7B%21term+f%3Dpivot_y_s%7D)})}
expected:<9> but was:<14>
   [junit4]    > 	at __randomizedtesting.SeedInfo.seed([F08A107C384690FC:716C9E644F19F0C0]:0)
   [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:190)
   [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:145)
   [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:863)
   [junit4]    > 	at java.lang.Thread.run(Thread.java:744)
   [junit4]    > Caused by: java.lang.AssertionError: pivot_y_s,pivot_x_s1: {params(rows=0),defaults({main(rows=0&q=id%3A%5B*+TO+503%5D),extra(fq=%7B%21term+f%3Dpivot_y_s%7D)})}
expected:<9> but was:<14>
   [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertNumFound(TestCloudPivotFacet.java:403)
   [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:208)
   [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:176)
   [junit4]    > 	... 42 more
{noformat}


This is saying that while doing a request with a pivot on the "pivot_y_s,pivot_x_s1" fields
it looped over the (top level) pivot constraints in "pivot_y_s" - and for one of those term
values (it just happens to be the empty string "") it got a pivot count of 9, but when it
executed a query filtering the main results on that term ("fq=\{!term f=pivot_y_s\}") the
total number of results found were 14.

If you comment out the line of the test that sets the FACET_PIVOT_MINCOUNT param, this seed
stats to pass, suggesting that it's almost certianly the mincount logic that's putting a kink
in the correctness of the final refined counts.


> Implement distributed pivot faceting
> ------------------------------------
>
>                 Key: SOLR-2894
>                 URL: https://issues.apache.org/jira/browse/SOLR-2894
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Erik Hatcher
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-2894-mincount-minification.patch, SOLR-2894-reworked.patch,
SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894_cloud_test.patch,
dateToObject.patch, pivot_mincount_problem.sh
>
>
> Following up on SOLR-792, pivot faceting currently only supports undistributed mode.
 Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message