Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jira@apache.org
Date: Tue, 18 Mar 2014 15:44:42 +0000 (UTC)
From: "Eric Newton (JIRA)" <jira@apache.org>
To: notifications@accumulo.apache.org
Message-ID: <JIRA.12701582.1394827663598.97864.1395157482728@arcas>
In-Reply-To: <JIRA.12701582.1394827663598@arcas>
References: <JIRA.12701582.1394827663598@arcas>
Subject: [jira] [Updated] (ACCUMULO-2480) ha fail-failover failure
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/ACCUMULO-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Newton updated ACCUMULO-2480:
----------------------------------

    Fix Version/s: 1.7.0

> ha fail-failover failure
> ------------------------
>
>                 Key: ACCUMULO-2480
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2480
>             Project: Accumulo
>          Issue Type: Bug
>          Components: master, tserver
>         Environment: running continuous ingest on a 74-node HA NN hadoop 2.3 cluster, 1.6.0-SNAPSHOT.
>            Reporter: Eric Newton
>             Fix For: 1.7.0
>
>
> Ran {{service network stop}} on the active NN.  The service failed to switch over since the fencing script on the standby failed to run (sshfence).
> After the network interface was re-established, the standby took over.
> However, accumulo ingest began to have very long hold times since the standby was not providing service for several minutes.
> The master attempted to shutdown the tablet servers with hold time.
> The filesystem hook closed the filesystem, and the servers got stuck endlessly trying to write to the WAL.
> Even after the NN was active, because the filesytem was closed, attempts to get a new WAL continued to fail.
> * why didn't the tablet servers stop?
> * WAL loop should be able to terminate if they see an IOException that indicates that the filesystem is closed


--
This message was sent by Atlassian JIRA
(v6.2#6252)