zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Wright <wrig...@gmail.com>
Subject Re: Dynamic adding/removing ZK servers on client
Date Tue, 04 May 2010 00:06:02 GMT
> This is tricky: what happens if the server your client is connected to is
> decommissioned by a view change, and you are unable to locate another server
> to connect to because other view changes committed while you are
> reconnecting have removed all the servers you knew about. We'd need to make
> sure that watches on this znode were fired before a view change, but it's
> hard to know how to avoid having to wait for a session timeout before a
> client that might just be migrating servers reappears in order to make sure
> it sees the veiw change.
> Even then, the problem of 'locating' the cluster still exists in the case
> that there are no clients connected to tell anyone about it.

Yes, this doesn't completely solve two issues:
1. Bootstrapping the cluster itself & clients
2. Major cluster reconfiguration (e.g. switching out every node before
clients can pickup the changes).

That said, I think it gets close and could still be useful.
For #1, you could simply require that the initial servers in the
cluster be manually configured, then servers could be added and
removed as needed. New servers would just need the address of one
other server to "join" and get the full server list. For clients,
you'd have a similar situation - you still need a way to pass an
initial server list (or at least 1 valid server) in to the client, but
that could be via HTTP, DNS, or manual list, then the clients
themselves could stay in sync with changes.
For #2, you could simply document that there are limits to how fast
you want to change the cluster, and that if you make too many changes
too fast, clients or servers may not pick up the change fast enough
and need to be restarted. In reality I don't think this will be much
of an issue - as long as at least one server from the "starting" state
stays up until everyone else gets reconnected, everyone should
eventually find that node and get the full server list.


View raw message