accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4012) FATE lock-up
Date Tue, 29 Sep 2015 21:12:04 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935866#comment-14935866
] 

Josh Elser commented on ACCUMULO-4012:
--------------------------------------

Took a look at the changes you committed. I'm trying to reason about the following:

{code}
       } catch (KeeperException.NoNodeException ex) {
+        log.debug("zookeeper error reading " + txpath + ": " + ex.toString(), ex);
+        sleepUninterruptibly(100, TimeUnit.MILLISECONDS);
         continue;
{code}

It's not clear to me why we would even want to retry reading this node if some part of the
path is missing in ZK. It seems like we could just immediately return null in that case, but
I assume I've missed some corner-case.

> FATE lock-up
> ------------
>
>                 Key: ACCUMULO-4012
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4012
>             Project: Accumulo
>          Issue Type: Bug
>          Components: master, tserver
>    Affects Versions: 1.5.3, 1.5.4, 1.6.0, 1.6.2, 1.6.3, 1.7.0
>         Environment: large production cluster
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>             Fix For: 1.7.1
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> On a large production cluster, some periodic data processing hangs on FATE transactions.
The basic operation is to bulk load the results of a map-reduce job into a temporary table,
which is then later deleted. Increasing the number of FATE threads has not improved the situation.
> The details are not clear, and unfortunately this system is not online, so I cannot reproduce
the logs easily, but they would be huge anyhow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message