Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Tue, 1 Sep 2015 06:40:46 +0000 (UTC)
From: "stack (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12859403.1440614769000.219178.1441089646386@Atlassian.JIRA>
In-Reply-To: <JIRA.12859403.1440614769000@Atlassian.JIRA>
References: <JIRA.12859403.1440614769000@Atlassian.JIRA>
 <JIRA.12859403.1440614769328@arcas>
Subject: [jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960)
 and can't roll WAL
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724870#comment-14724870 ] 

stack commented on HBASE-14317:
-------------------------------

Patch is for 1.2. Will make a patch for master when have fix.

[~eclark] I took a look at your patch. I get now what you mean by a sync-after-a-failed-append should always fail. Agree. Lets fix that too. I think we should be able to have more finesse than what is here where we stamp out everything -- smile. I think a sync could come in even after all we've made all our noise stamping on everything (let me do the server mocks they way you have them in the patch too... and integrate your test).

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -----------------------------------------------------
>
>                 Key: HBASE-14317
>                 URL: https://issues.apache.org/jira/browse/HBASE-14317
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.1.1
>            Reporter: stack
>            Priority: Critical
>         Attachments: 14317.test.txt, HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. See attached thread dump and associated log. What is interesting is that syncers are waiting to take syncs to run and at same time we want to flush so we are waiting on a safe point but there seems to be nothing in our ring buffer; did we go to roll log and not add safe point sync to clear out ringbuffer?
> Needs a bit of study. Try to reproduce.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)