lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: Issue paging when sorting on a Date field
Date Tue, 20 May 2014 15:23:35 GMT
This is using the solr.TrieDateField, it is the field type "date" from the
example schema in solr 4.6.1:
<fieldType name="date" class="solr.TrieDateField" precisionStep="0"
positionIncrementGap="0" />

After further testing I was only able to reproduce this in a sharded &
replicated environment (numShards=3, replicationFactor=2) and I think I
have narrowed down the issue, and at this point it may be expected
behavior...

I took a query like q=create_date:[2014-05-19T00:00:00Z TO
2014-05-19T23:59:59Z]&sort=create_date DESC&start=0&rows=10000 which should
get all the documents for yesterday sorted by create date, and then added
distrib=false and ran it against shard1_replica1 and shard1_replica2. Then
I diff'd the files and it showed 5 occurrences where two consecutive rows
in one replica were reversed in the other replica, and in all 5 cases the
flipped flopped rows had the exact same create_date value, which happened
to only go down to the minute.

As an example:

shard1_replica1:
...
docX, 2014-05-19T20:15:00Z
docY, 2014-05-19T20:15:00Z
...

shard1_replica2:
...
docY, 2014-05-19T20:15:00Z
docX, 2014-05-19T20:15:00Z
...

So I think when I was paging through the results, if the query for page N
was handled by replica1 and page N+1 handled by replica2, and the page
boundary happened to be where the reversed rows were, this would produce
the behavior I was seeing where the last row from the previous page was
also the first row from the next page.

I guess the obvious solution is to ensure the date field is always more
granular than minutes, or add another field to the sort order to
consistently break ties.


On Mon, May 19, 2014 at 4:19 PM, Chris Hostetter
<hossman_lucene@fucit.org>wrote:

>
> : Using Solr 4.6.1 and in my schema I have a date field storing the time a
> : document was added to Solr.
>
> what *exactly* does your schema look like?  are you using "solr.DateField"
> or "solr.TrieDateField" ? what field options do you have specified?
>
> : I have a utility program which:
> : - queries for all of the documents in the previous day sorted by create
> date
> : - pages through the results keeping track of the unique document ids
> : - compare the total number of unique doc ids to the numFound to see if it
> : they match
>
> what *exactly* do your queries look like?  show us some examples please
> (URL & results).  Are you using distributed searching across multiple
> nodes, or a single node?  do you have concurrent updates going on during
> your test?
>
> : It is not consistent between tests, the number of occurrences changes and
> : the locations of the occurrences can change as well. The larger the
> result
> : set, and smaller the page size, the more frequent the occurrences are.
>
> if you bring up a test instance of Solr using your current configs, can
> you reproduce (even occasionally) with some synthetic data you can share
> with us?  If so please provide your full configs & sample data (ie: create
> a Jira & attach all the neccessary files i na ZIP)
>
>
> -Hoss
> http://www.lucidworks.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message