accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2422) Backup master can miss acquiring lock when primary exits
Date Fri, 28 Feb 2014 05:20:19 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915471#comment-13915471
] 

Josh Elser commented on ACCUMULO-2422:
--------------------------------------

How long of a timeframe are you talking about here? By your tone, I'm assuming at least seconds,
if not minutes? Indefinitely?

Getting jstack's of the masters in this state would be good. Also, you should check the data
in zk /accumulo/uuid/masters and any children at /accumulo/uuid/masters/lock/zlock-* to see
what's going on there. It's possible that some of the convenience methods that we wrap ZK
with have some issue, but it's primarily ZK code there.

> Backup master can miss acquiring lock when primary exits
> --------------------------------------------------------
>
>                 Key: ACCUMULO-2422
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2422
>             Project: Accumulo
>          Issue Type: Bug
>          Components: fate, master
>    Affects Versions: 1.5.0
>            Reporter: Bill Havanki
>            Assignee: Bill Havanki
>            Priority: Critical
>              Labels: failover, locking
>
> While running randomwalk tests with agitation for the 1.5.1 release, I've seen situations
where a backup master that is eligible to grab the master lock continues to wait. When this
condition arises and the other master restarts, both wait for the lock without success.
> I cannot reproduce the problem reliably, and I think more investigation is needed to
see what circumstances could be causing the problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message