lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr Cloud: Duplicate documents in multiple shards
Date Tue, 21 Jul 2015 04:41:05 GMT
bq: We have 130 million documents in our set up and the routing key is set as
"compositeId".

The most likely explanation is that somehow you've sent the same document out
with different routing keys. So what is the ID field (or, more generally, your
<uniqueKey> field) for a pair of duplicated documents? My bet is that
whatever is
in front of the ! symbol is different.

As far as indexing done when all replicas for a shard are down.. it
should completely
fail and the document should be nowhere in the collection.

Best,
Erick

On Mon, Jul 20, 2015 at 4:41 AM, mesenthil1
<senthilkumar.arumugam@viacomcontractor.com> wrote:
> Hi All,
>
> We are using solr 4.2.1 cloud with 5 shards  set up ( 1 leader & 1 replica
> for each shard). We are seeing the following issue in our set up.
> Few of the documents are getting returned from more than one shard for
> queries. When we try to update the document, it is not updating the
> documents on both and is getting updated on single shard. Even we are unable
> to delete the document as well. Can you please clarify the following?
>
> 1. What happens if a shard(both leader and replica) goes down. If the
> document on the "died shard" is updated, will it forward the document to the
> new shard. If so, when the "died shard" comes up again, will this not be
> considered for the same hask key range?
> 2. Is there a way to fix this[removing duplicates across shards]?
>
> We have 130 million documents in our set up and the routing key is set as
> "compositeId".
>
> Senthil
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Duplicate-documents-in-multiple-shards-tp4218162.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message