zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Why are ephemeral nodes written to disk?
Date Wed, 17 Jan 2018 23:27:23 GMT
On Tue, Jan 9, 2018 at 12:38 PM, Jeff Widman <jeff@jeffwidman.com> wrote:

> Ephemeral nodes only exist for the life of the client session.
>
> As far as I understand, by definition, a client session ends when the
> entire zookeeper ensemble goes down.
>
> So I would expect that ephemeral nodes are only written to memory, not
> disk. The ephemeral nodes would be sync'd across machines as a client
> session can span multiple connections if a single zk server fails, but once
> the ensemble is down there is no need to recover the ephemeral nodes from
> disk.
>
> However, when I looked at a zookeeper ensemble that is 99% ephemeral nodes,
> I see a bunch of disk I/O from the zookeeper processes. So it appears that
> ephemeral nodes are still written to disk...
>
> Why is this?
>

Ephemeral znodes are treated just like persistent znodes in the sense that
a quorum of nodes need to agree to any change. As such the znode is written
to the transaction log.

"a client session ends when the entire zookeeper ensemble goes down"

is not correct. A client session ends either when a client closes it's
session explicitly or the ZK quorum leader decides that the session has
expired (which is based on the negotiated session timeout). Only while a
leader is active can a session be expired (or closed for that matter). When
you shutdown an ensemble the sessions are maintained. If you were to, for
example, shut down an ensemble for an hour and then restart it the sessions
would still be active. The clock would "reset" when the new leader was
elected. If the client session is still active the session would continue,
any ephemeral znodes would still exist.

Patrick


>
> --
>
> *Jeff Widman*
> jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
> <><
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message