lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: [SolrCloud] shard hash ranges changed after restoring backup
Date Wed, 15 Jun 2016 17:23:32 GMT
Simplest, though a bit risky is to manually edit the znode and
correct the znode entry. There are various tools out there, including
one that ships with Zookeeper (see the ZK documentation).

Or you can use the zkcli scripts (the Zookeeper ones) to get the znode
down to your local machine, edit it there and then push it back up to ZK.

I'd do all this with my Solr nodes shut down, then insure that my ZK
ensemble was consistent after the update etc....

Best,
Erick

On Wed, Jun 15, 2016 at 8:36 AM, Gary Yao <gary.yao@zalando.de> wrote:
> Hi all,
>
> My team at work maintains a SolrCloud 5.3.2 cluster with multiple
> collections configured with sharding and replication.
>
> We recently backed up our Solr indexes using the built-in backup
> functionality. After the cluster was restored from the backup, we
> noticed that atomic updates of documents are failing occasionally with
> the error message 'missing required field [...]'. The exceptions are
> thrown on a host on which the document to be updated is not stored. From
> this we are deducing that there is a problem with finding the right host
> by the hash of the uniqueKey. Indeed, our investigations so far showed
> that for at least one collection in the new cluster, the shards have
> different hash ranges assigned now. We checked the hash ranges by
> querying /admin/collections?action=CLUSTERSTATUS. Find below the shard
> hash ranges of one collection that we debugged.
>
>   Old cluster:
>     shard1_0 80000000 - aaa9ffff
>     shard1_1 aaaa0000 - d554ffff
>     shard2_0 d5550000 - fffeffff
>     shard2_1 ffff0000 - 2aa9ffff
>     shard3_0 2aaa0000 - 5554ffff
>     shard3_1 55550000 - 7fffffff
>
>   New cluster:
>     shard1 80000000 - aaa9ffff
>     shard2 aaaa0000 - d554ffff
>     shard3 d5550000 - ffffffff
>     shard4 0 - 2aa9ffff
>     shard5 2aaa0000 - 5554ffff
>     shard6 55550000 - 7fffffff
>
>   Note that the shard names differ because the old cluster's shards were
>   split.
>
> As you can see, the ranges of shard3 and shard4 differ from the old
> cluster. This change of hash ranges matches with the symptoms we are
> currently experiencing.
>
> We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750
> in which David Smiley comments:
>
>   shard hash ranges aren't restored; this error could be disasterous
>
> It seems that this is what happened to us. We would like to hear some
> suggestions on how we could recover from this problem.
>
> Best,
> Gary

Mime
View raw message