Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7251018216 for ; Thu, 27 Aug 2015 15:54:47 +0000 (UTC) Received: (qmail 50079 invoked by uid 500); 27 Aug 2015 15:54:47 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 50036 invoked by uid 500); 27 Aug 2015 15:54:47 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 50025 invoked by uid 99); 27 Aug 2015 15:54:47 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Aug 2015 15:54:47 +0000 Date: Thu, 27 Aug 2015 15:54:47 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14317: -------------------------- Priority: Critical (was: Major) Making critical. In this case all app servers were blocked because they could not write to the regions hosted on this server. > Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL > ----------------------------------------------------- > > Key: HBASE-14317 > URL: https://issues.apache.org/jira/browse/HBASE-14317 > Project: HBase > Issue Type: Bug > Affects Versions: 1.1.1 > Reporter: stack > Priority: Critical > Attachments: [Java] RS stuck on WAL sync to a dead DN - Pastebin.com.html, raw.php, subset.of.rs.log > > > hbase-1.1.1 and hadoop-2.7.1 > We try to roll logs because can't append (See HDFS-8960) but we get stuck. See attached thread dump and associated log. What is interesting is that syncers are waiting to take syncs to run and at same time we want to flush so we are waiting on a safe point but there seems to be nothing in our ring buffer; did we go to roll log and not add safe point sync to clear out ringbuffer? > Needs a bit of study. Try to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)