zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Serious problem processing hearbeat on login stampede
Date Thu, 14 Apr 2011 22:31:55 GMT
2011/4/14 Chang Song <tru64ufs@me.com>

> You need to understand that most app can tolerate delay in connect/close,
>
but we cannot tolerate ping delay since we are using ZK heartbeat TO
> for sole failure detection.
>

What about using multiple ZK clusters for this, then?

But it really sounds like your ZK machines are misconfigured somehow.
 Session start/stop isn't any more
expensive than znode updates and a small ZK cluster can handle tens of
thousands of those per second if
set up correctly.

Have you tested a cluster where the machines are set up correctly with
separate snapshot and log disks?

Are your ZK machines doing any other tasks?


> We use 15 seconds (5 sec for each ensemble)
> for session timeout, important server will drop out of the clusters even
> if the server is not malfunctioning, in some cases, it wreaks havoc on
> certain
> services.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message