zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael K. Edwards" <m.k.edwa...@gmail.com>
Subject ReconfigInProgress error
Date Sat, 24 Nov 2018 19:32:08 GMT
I've been experimenting a bit with trying to propagate failures to
bind() server ports in tests up to where we can do something about it.
There's at least one category of test cases (callers of
ReconfigTest.testPortChangeToBlockedPort) where the server is supposed
to ride through a bind() failure, recovering on a subsequent
reconfiguration.  In my current code state, I'm encountering errors
like this:

2018-11-24 11:04:46,252 [myid:] - INFO  [ProcessThread(sid:3
cport:-1)::PrepRequestProcessor@878] - Got user-level KeeperException
when processing sessionid:0x1002b98aa830000 type:reconfig cxid:0x1e
zxid:0x10000002b txntype:-1 reqpath:n/a Error Path:null
Error:KeeperErrorCode = ReconfigInProgress

I can hack things until this particular test passes, but it raises
questions about reconfiguration in general.  How exactly is the
cluster supposed to get out of this state?  If a cluster member drops
out of contact with the quorum while there is a reconfiguration in
flight, is there any recovery path that restores the ability to
process a reconfigure operation?  Is there a design doc for
reconfiguration that demonstrates the kind of robustness against
Byzantine faults that one is led to expect from Zookeeper?

View raw message