zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: devops/admin/client question: What do you do when you rollback?
Date Thu, 04 Aug 2011 21:45:18 GMT
This is used normally to guarantee in-order data views.  If you get
disconnected from one host in an advanced state and then connect to an out
of date slave, ZK automatically disconnects you to avoid letting you see
time go backwards.  Your situation is different of course.

On Thu, Aug 4, 2011 at 7:05 PM, Fournier, Camille F. <
Camille.Fournier@gs.com> wrote:

> Right now the server just detects that the zxid is wrong, and calls close
> on the client. The client logs:
> 15:01:47,593 - INFO
>  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1159] - Unable to
> read additional data from server sessionid 0x131962b00540000, likely server
> has closed socket, closing socket connection and attempting reconnect
> (branch 3.3.3)
> I will poke around and see if I can figure out a nicer way to indicate this
> condition. The expired state is perfectly fine for me in my use case.
> C
> -----Original Message-----
> From: Patrick Hunt [mailto:phunt@apache.org]
> Sent: Thursday, August 04, 2011 1:51 PM
> To: user@zookeeper.apache.org
> Subject: Re: devops/admin/client question: What do you do when you
> rollback?
> On Thu, Aug 4, 2011 at 10:29 AM, Fournier, Camille F.
> <Camille.Fournier@gs.com> wrote:
> > We had an issue here the other day where the ZK servers were running
> poorly, and in an effort to get them healthy again we ended up rolling back
> the cluster state. While this was, in retrospect, not the right solution to
> the problem we were facing, it brought up another problem. Namely, that many
> of our clients couldn't reconnect with their sessions because their zxid was
> too high (expected), but that the error they got when trying to do that
> reconnection was just a vanilla disconnected error. The result was that most
> of our clients had to be bounced.
> Hi Camille, there's a long standing jira on this:
> https://issues.apache.org/jira/browse/ZOOKEEPER-523
> > Aside from trying hard to avoid ever rolling back the cluster state, does
> anyone have a way they deal with this situation if it occurs? Should we
> consider enhancing the error message to the client so we could track the
> fact that we were ahead of the quorum zxid and react sensibly? Alternately,
> since we were sending a sessionId along with the zxid, perhaps it would be
> nice to check to see if the sessionId exists before checking the zxid, which
> would send an expired state signal which my client code could handle
> cleanly.
> It seems reasonable that if the client connects to all servers in the
> ensemble (that it knows about) and sees that it's ahead of each one,
> it should consider the session expired (we could add a new state, but
> seems like just treating as expired with a good log message would be
> better from b/w compat standpoint).
> I can't recall, does the client have sufficient information to make
> this determination, or is the server just disconnecting?
> Patrick

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message