Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7143A17791 for ; Wed, 29 Apr 2015 10:14:39 +0000 (UTC) Received: (qmail 31221 invoked by uid 500); 29 Apr 2015 09:59:38 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 6478 invoked by uid 500); 29 Apr 2015 09:58:54 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 94300 invoked by uid 99); 29 Apr 2015 07:29:06 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Apr 2015 07:29:06 +0000 Date: Wed, 29 Apr 2015 07:29:06 +0000 (UTC) From: "Vikas Vishwakarma (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-13592) RegionServer sometimes gets stuck during shutdown in case of cache flush failures MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Vikas Vishwakarma created HBASE-13592: ----------------------------------------- Summary: RegionServer sometimes gets stuck during shutdown in case of cache flush failures Key: HBASE-13592 URL: https://issues.apache.org/jira/browse/HBASE-13592 Project: HBase Issue Type: Bug Affects Versions: 0.98.10 Reporter: Vikas Vishwakarma Assignee: Vikas Vishwakarma Observed that RegionServer sometimes gets stuck during shutdown in case of cache flush failures. On adding few debug logs and looking through the stack trace RegionServer process looks stuck in closeWAL -> hlog.close -> closeBarrier.stopAndDrainOps(); during the shutdown sequence in the run method >From the RegionServer logs we see there are multiple attempts to flush cache for a particular region which increments the beginOp count in DrainBarrier but all the flush attempts fails somewhere in wal sync and the DrainBarrier endOp count decrement never happens. Later on when shutdown is initiated RegionServer process is permanently stuck here In this case hbase stop also does not work and RegionServer process has to be explicitly killed using kill -9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)