lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-6386) make secondary ordering of facet.field values (and facet.pivot?) consistently deterministic
Date Mon, 08 Sep 2014 16:00:31 GMT

    [ https://issues.apache.org/jira/browse/SOLR-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125694#comment-14125694
] 

Erick Erickson commented on SOLR-6386:
--------------------------------------

[~hossman_lucene@fucit.org] Some things I found out this weekend:
[~markrmiller@gmail.com] Pinging you on this because I half suspect that there's something
weird with the test infrastructure.

Frankly I'm at a loss, but here's the outstanding things I saw. I'm pretty sure my question
of whether this would "just get taken care of" by the stuff I'm doing for SOLR-6187 is "no",
so I'm assigning it back to nobody. Adding the facet.limit=1 in the test makes the problem
disappear just b/c all the bogus 0 counts that get returned are removed.

> If I optimize the clients and control server in BaseDistributedSearchTestCase.commit,
then this test case does NOT fail. But I must optimize both. If I just optimize the control,
it fails. If I just optimize the clients it fails. This really weirds me out. I suspected
pilot error here frankly, so I just tried it again and I'm pretty sure I'm not hallucinating.
I'd expect optimizing the distributed case would fix this up but nooooo. So I wonder if there's
something weird here with RAMDirectory which underpins the servers.... Although just for yucks
I tried using a disk-based directory and it still seemed to fail although I won't swear that
I got it right.

> I set up IntelliJ with the seeds etc. you provided and it's not until the third pass
that it fails. But it fails every time on the third pass. Ditto with running the test from
the command shell.

> in DocValuesFacet.getCount, around line 200 or so I'm printing out the values added.
This is near the bottom of the clause:
if (sort.equals(FacetParams.FACET_SORT_COUNT) || sort.equals(FacetParams.FACET_SORT_COUNT_LEGACY))
{
... near the end
} else...

On the pass that fails, I get these values:
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-04-20T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T10:59:56.032Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T10:57:12.192Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-02T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T07:10:00.704Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-05T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-04-27T16:01:01.44Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 2009-03-13T13:23:01.248Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0

Notice the Jan-1, 1970. dates. Sure seems like a zero snuck in there somewhere. If you sum
up the non-zero counts, you wind up with the right facet counts.

On the pass that's optimized, I get this on the third pass which is consistent with what the
control server gives back, thus it passes.:
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-04-20T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-02T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-05T11:00:00Z 1

Anyway, this is beyond what I want to deal with just now. Let me know if there's anything
else I can provide. 


> make secondary ordering of facet.field values (and facet.pivot?) consistently deterministic
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-6386
>                 URL: https://issues.apache.org/jira/browse/SOLR-6386
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Hoss Man
>            Assignee: Erick Erickson
>
> as a fluke of how the SOLR-2894 patch evolved, it wound up adding a bit of testing of
distributed facet.field on date fields (see [r1617789 changes to TestDistributedSearch|https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test/org/apache/solr/TestDistributedSearch.java?r1=1617789&r2=1617788&pathrev=1617789])
... but this started triggering some random failures due to facet constraints with identical
values being sorted differently between the distributed query and the single node control
query.
> We should make the facet.field (and facet.pivot) code order constraints with tied counts
consistently regardless of whether it's a distrib search or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message