hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "ZooKeeper/ErrorHandling" by BenjaminReed
Date Thu, 07 May 2009 17:53:58 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by BenjaminReed:
http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling

------------------------------------------------------------------------------
  
   * Normal state exceptions: trying to create a znode that already exists, calling setData
on an existing znode, doing a conditional write to a znode that has an unexpected version
number, etc.
  
-  * Recoverable errors: the disconnected event and the connection loss exception are examples
of recoverable errors, they indicate a problem that happened, but the ZooKeeper handle is
still valid and future operations will probably succeed.
+  * Recoverable errors: the disconnected event, connection timed out, and the connection
loss exception are examples of recoverable errors, they indicate a problem that happened,
but the ZooKeeper handle is still valid and future operations will succeed once the ZooKeeper
library can reestablish its connection to ZooKeeper.
  
   * Fatal errors: the ZooKeeper handle has become invalid. This can be due to an explicit
close, authentication errors, or session expiration.
  
@@ -16, +16 @@

  
  == Recoverable errors ==
  
- Recoverable errors are passed back to the application because ZooKeeper itself cannot recover
from them. "But wait", you say, "if I'm doing a getData(), can't ZooKeeper just reissue it
for me". Yes, of course ZooKeeper could as long as you were just doing a getData(). What if
you were doing a create() or a delete() or a conditional setData()? When a ZooKeeper client
loses a connection to the ZooKeeper server there may be some requests in flight; we don't
know where they were in their flight at the time of the connection loss.
+ Recoverable errors are passed back to the application because ZooKeeper itself cannot recover
from them. The ZooKeeper library does try to recover the connection, so the handle should
not be closed on a recoverable error, but the application must deal with the transient error.
"But wait", you say, "if I'm doing a getData(), can't ZooKeeper just reissue it for me". Yes,
of course ZooKeeper could as long as you were just doing a getData(). What if you were doing
a create() or a delete() or a conditional setData()? When a ZooKeeper client loses a connection
to the ZooKeeper server there may be some requests in flight; we don't know where they were
in their flight at the time of the connection loss.
  
  For example the create we sent just before the loss may not have made it out of the network
stack, or it may have made it to the ZooKeeper server we were connected to and been forwarded
on to other servers before our server died. So, when we reestablish the connection to the
ZooKeeper service we have no good way to know if our create executed or not. (The server actually
has the needed information, but there is a lot of implementation work that needs to happen
to take advantage of that information. Ironically, once we make the mutating requests re-issuable,
the read requests become problematic...) So, if we reissue a create() request and we get a
NodeExistsException, the ZooKeeper client doesn't know if the exception resulted because the
previous request went through or someone else did the create.
  

Mime
View raw message