lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Pagination bug? when sorting by a field (not unique field)
Date Wed, 29 Mar 2017 14:19:42 GMT
I can answer at least one bit...

If all the sort fields are equal, the _internal_ Lucene document ID
(not <unkqueKye>) is used to break the tie.The kicker is that the
internal Lucene ID can change when merging segments. Further, the
internal ID for two given docs can change relative to each other. I.e.

starting state:

unique key      internal Lucene doc ID
1                               1
2                               2

Sometime after merging:

unique key      internal Lucene doc ID
1                               2
2                               1

So if this problem only occurs when you're _also_ indexing this could
be happening.

Best,
Erick

On Wed, Mar 29, 2017 at 6:40 AM, Pablo Anzorena <anzorena.fing@gmail.com> wrote:
> Mikhall,
>
> effectively maxDocs are different and also deletedDocs, but numDocs are ok.
>
> I don't really get it, but can that be the problem?
>
> 2017-03-29 10:35 GMT-03:00 Mikhail Khludnev <mkhl@apache.org>:
>
>> Can it happen that replicas are different by deleted docs? I mean numDocs
>> is the same, but maxDocs is different by number of deleted docs, you can
>> see it in solr admin at the core page.
>>
>> On Wed, Mar 29, 2017 at 4:16 PM, Pablo Anzorena <anzorena.fing@gmail.com>
>> wrote:
>>
>> > Shawn,
>> >
>> > Yes, the field has duplicate values and yes, if I add the secondary sort
>> by
>> > the uniqueKey it solve the issue.
>> >
>> > Those 2 situations you mentioned are not occurring, none of them. The
>> index
>> > is replicated, but not sharded.
>> >
>> > Does solr sort by an internal id if no uniqueKey is present in the sort?
>> >
>> > 2017-03-29 9:58 GMT-03:00 Shawn Heisey <apache@elyograg.org>:
>> >
>> > > On 3/29/2017 6:35 AM, Pablo Anzorena wrote:
>> > > > I was paginating the results of a query and noticed that some
>> > > > documents were repeated across pagination buckets of 100 rows. When
I
>> > > > sort by the unique field there is no repeated document but when I
>> sort
>> > > > by another field then repeated documents appear. I assume is a bug
>> and
>> > > > it's not the intended behaviour, right?
>> > >
>> > > There is a potential situation that can cause this problem that is NOT
>> a
>> > > bug.
>> > >
>> > > If the field you are sorting on contains duplicate values (same value
>> in
>> > > multiple documents), then I am pretty sure that the sort order of
>> > > documents with the same value in the sort field is non-deterministic in
>> > > these situations:
>> > >
>> > > 1) A distributed (sharded) index.
>> > > 2) When the index contents can change between a request for one page
>> and
>> > > a request for the next page -- documents being added, deleted, or
>> > changed.
>> > >
>> > > Because the sort order of documents with the same value can change, one
>> > > document that may have ended up on the first page on the first query
>> may
>> > > end up on the second page on the second query.
>> > >
>> > > Sorting by a field with no duplicate values (the unique field you
>> > > mentioned) will always result in the exact same sort order ... but if
>> > > you add documents that sort to near the start of the sort order between
>> > > queries, the behavior you have noticed can still happen.
>> > >
>> > > If this is what you are encountering, adding secondary sort on the
>> > > uniqueKey field would probably clear up the problem.  If your uniqueKey
>> > > field is "id", something like this:
>> > >
>> > > sort=someField desc,id desc
>> > >
>> > > Thanks,
>> > > Shawn
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>

Mime
View raw message