zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: adding a separate thread to detect network timeouts faster
Date Tue, 10 Sep 2013 20:31:21 GMT
I don't see the strong value here.  A few failures would be detected more
quickly, but I am not convinced that this would actually improve
functionality significantly.

On Tue, Sep 10, 2013 at 1:01 PM, Jeremy Stribling <strib@nicira.com> wrote:

> Hi all,
> Let's assume that you wanted to deploy ZK in a virtualized environment,
> despite all of the known drawbacks.  Assume we could deploy it such that
> the ZK servers were all using independent CPUs and storage (though not
> dedicated disks).  Obviously, the shared disks (shared with other, non-ZK
> VMs on the same hypervisor) will cause ZK to hit the default session
> timeout occasionally, so you would need to raise the existing session
> timeout to something like 30 seconds.
> I'm curious if there would be any technical drawbacks to adding an
> additional heartbeat mechanism between the clients and the servers, which
> would have the goal of detecting network-only failures faster than the
> existing heartbeat mechanism.  The idea is that there would be a new thread
> dedicated to processing these heartbeats, which would not get blocked on
> I/O.  Then the clients could configure a second, smaller timeout value, and
> it would be assumed that any such timeout indicated a real problem.  The
> existing mechanism would still be in place to catch I/O-related errors.
> I understand the philosophy that there should be some heartbeat mechanism
> that takes the disk into account, but I'm having trouble coming up with
> technical reasons not to add a second mechanism. Obviously, the advantage
> would be that the clients could detect network failures and system crashes
> more quickly in an environment with slow disks, and fail over to other
> servers more quickly.  The only disadvantages I can come up with are:
> 1) More code complexity, and slightly more heartbeat traffic on the wire
> 2) I think the servers have to log session expirations to disk, so if the
> sessions expire at a faster rate than the disk can handle, it might lead to
> a large backlog.
> Are there other drawbacks I am missing?  Would a patch that added
> something like this be considered, or is it dead from the start? Thanks,
> Jeremy

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message