lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Cursor mark page duplicates
Date Thu, 07 Nov 2019 22:58:02 GMT

: I'm using Solr's cursor mark feature and noticing duplicates when paging 
: through results.  The duplicate records happen intermittently and appear 
: at the end of one page, and the beginning of the next (but not on all 
: pages through the results). So if rows=20 the duplicate records would be 
: document 20 on page1, and document 21 on page 2.  The document's id come 

Can you try to reproduce and show us the specifics of this including:

1) The sort param you're using
2) An 'fl' list that includes every field in the sort param
3) The returned values of every 'fl' field for the "duplicate" document 
you are seeing as it appears in *BOTH* pages of results -- allong with the 
cursorMark value in use on both of those pages.


: (YYYY-MM-DD HH:MM.SSSSSS)), score. In this Solr community post 
: (https://lucene.472066.n3.nabble.com/Solr-document-duplicated-during-pagination-td4269176.html)

: Shawn Heisey suggests:

...that post was *NOT* about using cursorMark -- it was plain old regular 
pagination, where even on a single core/replica you can see a document 
X get "pushed" from page#1 to page#2 by updates/additions of some other
doxument Z that causes Z to sort "before" X.

With cursors this kind of "pushing other docs back" or "pushing other docs 
forward" doesn't exist because of the cursorMark.  The only way a doc 
*should* move is if it's OWN sort values are updated, causing it to 
reposition itself.

But, if you have a static index, then it's *possible* that the last time 
your document X was updated, there was a "glitch" somewhere in the 
distributed update process, and the update didn't succeed in osme 
replicas -- so the same document may have different sort values 
on diff replicas.

: In the Solr query below for one of the example duplicates in question I 
: can see a search by the id returns only a single document. The 
: replication factor for the collection is 2 so the id will also appear in 
: this shards replica.  Taking into consideration Shawn's advice above, my 

If you've already identified a particular document where this has 
happened, then you can also verify/disprove my hypothosis by hitting each 
of the replicas that hosts this document with a request that looks like...

/solr/MyCollection_shard4_replica_n12/select?q=id:FOO&distrib=false
/solr/MyCollection_shard4_replica_n35/select?q=id:FOO&distrib=false

...and compare the results to see if all field values match


-Hoss
http://www.lucidworks.com/

Mime
View raw message