lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shalin Shekhar Mangar <shalinman...@gmail.com>
Subject Re: Solr cluster doesn't recover from a ZK disconnect if collection.reload() was issued
Date Thu, 14 Jan 2016 15:26:42 GMT
Which version of Solr is this on?

On Thu, Jan 14, 2016 at 4:10 PM, Gili Nachum <gilinachum@gmail.com> wrote:
> Clarificaiton: If we restart nodes after reloading collection and before
> pausing, then recovery works fine.
>
> On Thu, Jan 14, 2016 at 12:08 PM, Gili Nachum <gilinachum@gmail.com> wrote:
>
>> Hi,
>>
>> Our Solr cluster is running VMs that could freeze for more than the ZK
>> tick time (it's a non critical CI/CD pipeline running on an overloaded
>> ESX). When this happens the node's shards will be registered as down. Then
>> when the node is back recovery takes place, and all shards replicas end up
>> active state. Everyone is happy.
>>
>> However, we noticed that recover doesn't take place if the collection was
>> reloaded and the server didn't restart since. Shards end up in done state.
>> Before providing log messages, I wonder if this is a known issue?
>>
>> Reproducing recipe (assume two nodes):
>> 1. Before starting: restart both solr1 and solr2: all shards are active.
>> 2. Reload the collection
>> 3. Cause disconnect by freezing the Java process:
>> On Solr2: kill -SIGSTOP <solr server pid> and then in 2 min kill -SIGCONT
>> <solr server pid>
>> 4. solr2 shard replicas are *Down *forever. No recovery.
>>
>> If we omit step #2, the cluster recovers as expected.
>>



-- 
Regards,
Shalin Shekhar Mangar.

Mime
View raw message