accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Busbey (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ACCUMULO-2422) Backup master can miss acquiring lock when primary exits
Date Fri, 28 Feb 2014 18:30:20 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Busbey updated ACCUMULO-2422:
----------------------------------

    Description: 
While running randomwalk tests with agitation for the 1.5.1 release, I've seen situations
where a backup master that is eligible to grab the master lock continues to wait. When this
condition arises and the other master restarts, both wait for the lock without success.

I cannot reproduce the problem reliably, and I think more investigation is needed to see what
circumstances could be causing the problem.

h3. Diagnosis and Work Around
This failure condition can occur on start up and on backup/active failover of the Master role.
If the follow log entry is the final entry on all Master logs you should restart all Master
roles, staggering by a few seconds.

{noformat}
[master.Master] INFO : trying to get master lock
{noformat}

If starting a cluster with multiple Master roles, you should stagger Master role starts by
a few seconds.


  was:
While running randomwalk tests with agitation for the 1.5.1 release, I've seen situations
where a backup master that is eligible to grab the master lock continues to wait. When this
condition arises and the other master restarts, both wait for the lock without success.

I cannot reproduce the problem reliably, and I think more investigation is needed to see what
circumstances could be causing the problem.


Added diagnosis and word  around text.

> Backup master can miss acquiring lock when primary exits
> --------------------------------------------------------
>
>                 Key: ACCUMULO-2422
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2422
>             Project: Accumulo
>          Issue Type: Bug
>          Components: fate, master
>    Affects Versions: 1.5.1
>            Reporter: Bill Havanki
>            Assignee: Bill Havanki
>            Priority: Critical
>              Labels: failover, locking
>             Fix For: 1.6.0, 1.5.2
>
>
> While running randomwalk tests with agitation for the 1.5.1 release, I've seen situations
where a backup master that is eligible to grab the master lock continues to wait. When this
condition arises and the other master restarts, both wait for the lock without success.
> I cannot reproduce the problem reliably, and I think more investigation is needed to
see what circumstances could be causing the problem.
> h3. Diagnosis and Work Around
> This failure condition can occur on start up and on backup/active failover of the Master
role. If the follow log entry is the final entry on all Master logs you should restart all
Master roles, staggering by a few seconds.
> {noformat}
> [master.Master] INFO : trying to get master lock
> {noformat}
> If starting a cluster with multiple Master roles, you should stagger Master role starts
by a few seconds.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message