zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishal Kher <vishalm...@gmail.com>
Subject Re: devops/admin/client question: What do you do when you rollback?
Date Sun, 07 Aug 2011 19:01:21 GMT
Hi Camille,

Can you share the kind of problems you were facing on the servers that
forced you to rollback the cluster?


On Thu, Aug 4, 2011 at 1:29 PM, Fournier, Camille F. <
Camille.Fournier@gs.com> wrote:

> We had an issue here the other day where the ZK servers were running
> poorly, and in an effort to get them healthy again we ended up rolling back
> the cluster state. While this was, in retrospect, not the right solution to
> the problem we were facing, it brought up another problem. Namely, that many
> of our clients couldn't reconnect with their sessions because their zxid was
> too high (expected), but that the error they got when trying to do that
> reconnection was just a vanilla disconnected error. The result was that most
> of our clients had to be bounced.
> Aside from trying hard to avoid ever rolling back the cluster state, does
> anyone have a way they deal with this situation if it occurs? Should we
> consider enhancing the error message to the client so we could track the
> fact that we were ahead of the quorum zxid and react sensibly? Alternately,
> since we were sending a sessionId along with the zxid, perhaps it would be
> nice to check to see if the sessionId exists before checking the zxid, which
> would send an expired state signal which my client code could handle
> cleanly.
> Any ideas or suggestions would be welcome.
> C

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message