hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-3065) Retry all 'retryable' zk operations; e.g. connection loss
Date Fri, 01 Oct 2010 15:53:32 GMT
Retry all 'retryable' zk operations; e.g. connection loss

                 Key: HBASE-3065
                 URL: https://issues.apache.org/jira/browse/HBASE-3065
             Project: HBase
          Issue Type: Bug
            Reporter: stack

The 'new' master refactored our zk code tidying up all zk accesses and coralling them behind
nice zk utility classes.  One improvement was letting out all KeeperExceptions letting the
client deal.  Thats good generally because in old days, we'd suppress important state zk changes
in state.  But there is at least one case the new zk utility could handle for the application
and thats the class of retryable KeeperExceptions.  The one that comes to mind is conection
loss.  On connection loss we should retry the just-failed operation.  Usually the retry will
just work.  At worse, on reconnect, we'll pick up the expired session event. 

Adding in this change shouldn't be too bad given the refactor of zk corralled all zk access
into one or two classes only.

One thing to consider though is how much we should retry.  We could retry on a timer or we
could retry for ever as long as the Stoppable interface is passed so if another thread has
stopped or aborted the hosting service, we'll notice and give up trying.  Doing the latter
is probably better than some kinda timeout.

HBASE-3062 adds a timed retry on the first zk operation.  This issue is about generalizing
what is over there across all zk access.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message