lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr Cloud: Duplicate documents in multiple shards
Date Mon, 27 Jul 2015 15:19:02 GMT
Hmmm, with that setup you should _not_ be getting
duplicate documents.

So, when you see duplicate documents, you're seeing
the exact same UUID on two shards, correct? My best
guess is that you've done something innocent-seeming
(that perhaps you forgot!) the resulted in this. Otherwise
there would be a lot more complaints of duplicate
documents.

In fact, what I'd do is create a new collection where you're
absolutely sure that nothing "interesting" has been done.
You can use "collection aliasing" to switch to that one after
you've re-indexed all your docs and are satisfied with
it. And I'm assuming that your UUID field is
1> labeled as the <unkqueKey>
and
2> a string type (NOT text).

Best,
Erick

On Mon, Jul 27, 2015 at 3:21 AM, mesenthil1
<senthilkumar.arumugam@viacomcontractor.com> wrote:
> Thanks Erick. As I understand now that the entire cluster goes down if any
> one shard is down, my first confusion is clarified.
>
> Following are the other details
>
> We really need to see details since I'm guessing we're talking
> past each other. So:
> *1> exactly how are you indexing documents?*
>      /using HTTPSolrServer and placing all update request to leader1/shard1.
> Enabled autoCommit with 60 seconds and not placing any commit from client
> application./
> *2> exactly how are you assigning a UUID to a doc?*
>      /defined an unique field in schema.xml and it is generated by the
> client application, ID format is {mongoDBHostName}-{mongoDBName}-{UUID}. /
> *3> do you ever re-index documents? If so, how are you
>    assuring that the UUID generated for any re-indexing operations
>    are the same ones used the first time? *
> /Yes we are re-indexing documents. We are getting the UUID from mongodb and
> the ID generated is same while we are doing update as well, using the same
> code. /
>
>
> We are unable to guess the root cause for having duplicate documents in
> multiple shards.  Also, it looks reindexing is the only solution for
> removing the duplicates.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Duplicate-documents-in-multiple-shards-tp4218162p4219251.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message