lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hendrik Haddorp <>
Subject Re: create collection gets stuck on node restart
Date Wed, 04 Jan 2017 13:23:40 GMT
Problem is that we would like to run without down times. Rolling updates 
worked fine so far except when creating a collection at the wrong time. 
I just did another test with stateFormat=2. This seems to greatly 
improve the situation. One collection creation got stuck but other 
creations still worked and after a restart of some nodes the stuck 
collection creation also looked ok. For some reason it just resulted in 
two replicas for the same shard getting assigned to the same node even 
though I specified a rule of "shard:*,replica:<2,node:*".

On 03.01.2017 15:34, Shawn Heisey wrote:
> On 1/3/2017 2:59 AM, Hendrik Haddorp wrote:
>> I have a SolrCloud setup with 5 nodes and am creating collections with
>> a replication factor of 3. If I kill and restart nodes at the "right"
>> time during the creation process the creation seems to get stuck.
>> Collection data is left in the clusterstate.json file in ZooKeeper and
>> no collections can be created anymore until this entry gets removed. I
>> can reproduce this on Solr 6.2.1 and 6.3, while 6.3 seems to be
>> somewhat less likely to get stuck. Is Solr supposed to recover from
>> data being stuck in the clusterstate.json at some point? I had one
>> instance where it looked like data was removed again but normally the
>> data does not seem to get cleaned up automatically and just blocks any
>> further collection creations.
>> I did not find anything like this in Jira. Just SOLR-7198 sounds a bit
>> similar even though it is about deleting collections.
> Don't restart your nodes at the same time you're trying to do
> maintenance of any kind on your collections.  Try to only do maintenance
> when they are all working, or you'll get unexpected results.
> The most recent development goal is make it so that collection deletion
> can be done even if the creation was partial.  The idea is that if
> something goes wrong, you can delete the bad collection and then be free
> to try to create it again.  I see that you've started another thread
> about deletion not fully eliminating everything in HDFS.  That does
> sound like a bug.  I have no experience with HDFS at all, so I can't be
> helpful with that.
> Thanks,
> Shawn

View raw message