zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Kelly <iv...@apache.org>
Subject Re: locking/leader election and dealing with session loss
Date Thu, 16 Jul 2015 12:57:54 GMT
On Thu, Jul 16, 2015 at 1:38 PM Jordan Zimmerman <jordan@jordanzimmerman.com>
wrote:

> Are you really seeing 30s gc pauses in production? If so, then of course
> this could happen. However, if your application can tolerate a 30s pause
> (which is hard to believe) then your session timeout is too low. The point
> of the session timeout is to have enough coverage. So, if your app has 30
> seconds allowable pauses your session timeout would have to be much longer.
>
GC is just an example. There's other ways the same scenario could happen.
The machine could swap out the process due to load. Someone could do
something stupid in the zookeeper event thread and the session expired
event is delayed. The state update could have hit the ip stack during
network partition, and the process then got wedged. The state update packet
could have hit the network and been routed via the moon. The clock could
break.

If you are relying on a timer on the zk client to maintain a guarantee,
then you really aren't giving any guarantee because the zk client doesn't
have control over all the things that could go wrong.

-Ivan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message