lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pablo Anzorena <anzorena.f...@gmail.com>
Subject Re: Pagination bug? when sorting by a field (not unique field)
Date Wed, 29 Mar 2017 13:40:38 GMT
Mikhall,

effectively maxDocs are different and also deletedDocs, but numDocs are ok.

I don't really get it, but can that be the problem?

2017-03-29 10:35 GMT-03:00 Mikhail Khludnev <mkhl@apache.org>:

> Can it happen that replicas are different by deleted docs? I mean numDocs
> is the same, but maxDocs is different by number of deleted docs, you can
> see it in solr admin at the core page.
>
> On Wed, Mar 29, 2017 at 4:16 PM, Pablo Anzorena <anzorena.fing@gmail.com>
> wrote:
>
> > Shawn,
> >
> > Yes, the field has duplicate values and yes, if I add the secondary sort
> by
> > the uniqueKey it solve the issue.
> >
> > Those 2 situations you mentioned are not occurring, none of them. The
> index
> > is replicated, but not sharded.
> >
> > Does solr sort by an internal id if no uniqueKey is present in the sort?
> >
> > 2017-03-29 9:58 GMT-03:00 Shawn Heisey <apache@elyograg.org>:
> >
> > > On 3/29/2017 6:35 AM, Pablo Anzorena wrote:
> > > > I was paginating the results of a query and noticed that some
> > > > documents were repeated across pagination buckets of 100 rows. When I
> > > > sort by the unique field there is no repeated document but when I
> sort
> > > > by another field then repeated documents appear. I assume is a bug
> and
> > > > it's not the intended behaviour, right?
> > >
> > > There is a potential situation that can cause this problem that is NOT
> a
> > > bug.
> > >
> > > If the field you are sorting on contains duplicate values (same value
> in
> > > multiple documents), then I am pretty sure that the sort order of
> > > documents with the same value in the sort field is non-deterministic in
> > > these situations:
> > >
> > > 1) A distributed (sharded) index.
> > > 2) When the index contents can change between a request for one page
> and
> > > a request for the next page -- documents being added, deleted, or
> > changed.
> > >
> > > Because the sort order of documents with the same value can change, one
> > > document that may have ended up on the first page on the first query
> may
> > > end up on the second page on the second query.
> > >
> > > Sorting by a field with no duplicate values (the unique field you
> > > mentioned) will always result in the exact same sort order ... but if
> > > you add documents that sort to near the start of the sort order between
> > > queries, the behavior you have noticed can still happen.
> > >
> > > If this is what you are encountering, adding secondary sort on the
> > > uniqueKey field would probably clear up the problem.  If your uniqueKey
> > > field is "id", something like this:
> > >
> > > sort=someField desc,id desc
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message