zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fournier, Camille F." <Camille.Fourn...@gs.com>
Subject devops/admin/client question: What do you do when you rollback?
Date Thu, 04 Aug 2011 17:29:17 GMT
We had an issue here the other day where the ZK servers were running poorly, and in an effort
to get them healthy again we ended up rolling back the cluster state. While this was, in retrospect,
not the right solution to the problem we were facing, it brought up another problem. Namely,
that many of our clients couldn't reconnect with their sessions because their zxid was too
high (expected), but that the error they got when trying to do that reconnection was just
a vanilla disconnected error. The result was that most of our clients had to be bounced.

Aside from trying hard to avoid ever rolling back the cluster state, does anyone have a way
they deal with this situation if it occurs? Should we consider enhancing the error message
to the client so we could track the fact that we were ahead of the quorum zxid and react sensibly?
Alternately, since we were sending a sessionId along with the zxid, perhaps it would be nice
to check to see if the sessionId exists before checking the zxid, which would send an expired
state signal which my client code could handle cleanly.

Any ideas or suggestions would be welcome.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message