zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Using zookeeper to assign a bunch of long-running tasks to nodes (without unhandled tasks and double-handled tasks)
Date Mon, 25 Jan 2010 06:08:03 GMT
On Sun, Jan 24, 2010 at 9:09 PM, Zheng Shao <zshao9@gmail.com> wrote:

> What will happen if C3 and Z3 are partitioned from the rest of the
> world?

Z3 will notice instantly that it is not part of a quorum and will
immediately freeze.  If and when it reconnects to the quorum, it will replay
any transactions that have occurred in the remaining quorum.

I am not entirely sure when a session expiration event will be delivered to
C3, but if and when ZK decides to expire the session, the ephemeral files
from C3 will disappear.  If C3 gets a connection loss event, then any
watches it had will be re-established, but if it gets session expiration,
all watches will be lost.

I guess C3 should see some errors, but where will I get it
> (since C3 is not calling any zookeeper functions after the ephemeral
> node is created.

IF C3 is running a multi-threaded library such as the Java library, then an
event will be delivered to any watcher.  If the server C3 does not have a
watcher and calls no ZK functions, then it may not know of the partition.
When the partition is healed and if the session has not expired, then C3 may
get watch events from other changes.  Obviously, C3 will not get watch
events associated with a connection that is lost and then expires.

> I am reading
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200807.mbox/%3C386225.96676.qm@web31804.mail.mud.yahoo.com%3E
> There are 2 types of error that C3 needs to handle: 1. disconnections;
> 2. session expirations.
> Is that still valid (since it's over 1.5 years old)?

Yes.  I believe that disconnections will be much less of a problem in the
near future.  Handling them correctly can be quite difficult so that will be

See http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling

And also http://issues.apache.org/jira/browse/ZOOKEEPER-22

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message