zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zk questions <zkquesti...@gmail.com>
Subject Problem recovering from a bad reconfig (3.5)
Date Sat, 09 Nov 2013 18:59:02 GMT
Hi,

I've been testing out the dynamic reconfig feature of 3.5 along with using
this patch (https://issues.apache.org/jira/browse/ZOOKEEPER-1691) and I'm
having an issue where my zk cluster won't allow me to perform further
reconfigs.
So here's what I'm doing:
1) Start nodes 1 and 2
2) Invoke reconfig on 1 to add 2; this suceeds
3) Start node 3 with the initial configuration with the dynamic config set
to just 2 and 3, where 2 isn't a leader (manually verified)
4) Invoke reconfig on 2 to add 3; this fails, with an error indicating that
another reconfig in progress
5) Then I restart 3 with the configuration containing just 1 and 3
6) Then I try again to add 3 to the cluster by invoking reconfig on 1 to
add 3; and again I see an error indicating that another reconfig is in
progress

FWIW: I'm testing this scenario to simulate the situation where I'm
automating the reconfig process and the dynamic configuration for 3 ends up
containing a node that isn't the leader.

I was wondering what I should do in this situation to recover from the
failure at step 3 so that we can fix the dynamic config and then attempt a
proper reconfig (steps 4 - 6)?

I've also attached a tar containing a script to automatically reproduce the
steps and problem I'm seeing above.

Thanks.

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message