accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3296) Infinite ZK retry loop somewhere
Date Tue, 04 Nov 2014 20:02:35 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196721#comment-14196721
] 

Josh Elser commented on ACCUMULO-3296:
--------------------------------------

Found it:

{noformat}
      Stat stat;
      while (true) {
        try {
          stat = getZooKeeper(info).exists(zPath, null);
          // Node exists
          if (stat != null) {
            try {
              // Try to delete it. We don't care if there was an update to the node
              // since we got the Stat, just delete all versions (-1).
              getZooKeeper(info).delete(zPath, -1);
              return;
            } catch (NoNodeException e) {
              // If the node is gone now, it's ok if we have SKIP
              if (policy.equals(NodeMissingPolicy.SKIP)) {
                return;
              }
              throw e;
            }
            // Let other KeeperException bubble to the outer catch
          }
        } catch (KeeperException e) {
          final Code c = e.code();
          if (c == Code.CONNECTIONLOSS || c == Code.OPERATIONTIMEOUT || c == Code.SESSIONEXPIRED)
{
            retryOrThrow(retry, e);
          } else {
            throw e;
          }
        }

        retry.waitForNextAttempt();
      }
{noformat}

If {{stat}} is null, we'll sleep and retry indefinitely. I guess the tserver removed itself
and the node got cleaned up.

> Infinite ZK retry loop somewhere
> --------------------------------
>
>                 Key: ACCUMULO-3296
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3296
>             Project: Accumulo
>          Issue Type: Bug
>          Components: master
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 1.6.2, 1.7.0
>
>
> ShutdownIT-shutdownDuringQuery failed.
> The end of the master log had the following:
> {noformat}
> 2014-11-04 09:47:56,220 [master.LiveTServerSet] INFO : Removing zookeeper lock for tserver:39492[1497a3301100002]
> 2014-11-04 09:47:56,243 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:47:56,494 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:47:56,745 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:47:56,996 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:47:57,247 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:47:57,498 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:47:57,749 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:47:58,000 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:47:58,252 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:47:58,503 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:47:58,754 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:47:59,006 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:47:59,257 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:47:59,508 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:47:59,759 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:48:00,011 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:48:00,262 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> 2014-11-04 09:48:00,513 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
> {noformat}
> The Retry log message kept repeating until the test timed out. Every invocation of that
sleep, should also include a message with the exception that was caught which caused us to
perform this retry.
> It seems likely that recursiveDelete isn't doing something correctly given that was the
last thing the Master was about to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message