Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Date: Wed, 29 Apr 2015 07:29:06 +0000 (UTC)
From: "Vikas Vishwakarma (JIRA)" <jira@apache.org>
To: dev@hbase.apache.org
Message-ID: <JIRA.12825831.1430292521000.18779.1430292546200@Atlassian.JIRA>
In-Reply-To: <JIRA.12825831.1430292521000@Atlassian.JIRA>
References: <JIRA.12825831.1430292521000@Atlassian.JIRA>
 <JIRA.12825831.1430292521853@arcas>
Subject: [jira] [Created] (HBASE-13592) RegionServer sometimes gets stuck
 during shutdown in case of cache flush failures
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Vikas Vishwakarma created HBASE-13592:
-----------------------------------------

             Summary: RegionServer sometimes gets stuck during shutdown in case of cache flush failures
                 Key: HBASE-13592
                 URL: https://issues.apache.org/jira/browse/HBASE-13592
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.98.10
            Reporter: Vikas Vishwakarma
            Assignee: Vikas Vishwakarma


Observed that RegionServer sometimes gets stuck during shutdown in case of cache flush failures. On adding few debug logs and looking through the stack trace RegionServer process looks stuck in closeWAL -> hlog.close -> closeBarrier.stopAndDrainOps(); during the shutdown sequence in the run method

>From the RegionServer logs we see there are multiple attempts to flush cache for a particular region which increments the beginOp count in DrainBarrier but all the flush attempts fails somewhere in wal sync and the DrainBarrier endOp count decrement never happens. Later on when shutdown is initiated RegionServer process is permanently stuck here

In this case hbase stop also does not work and RegionServer process has to be explicitly killed using kill -9


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)