Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AAD1911325 for ; Fri, 5 Sep 2014 05:36:26 +0000 (UTC) Received: (qmail 20658 invoked by uid 500); 5 Sep 2014 05:36:26 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 20610 invoked by uid 500); 5 Sep 2014 05:36:26 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 20599 invoked by uid 99); 5 Sep 2014 05:36:26 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Sep 2014 05:36:26 +0000 Date: Fri, 5 Sep 2014 05:36:26 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-11902) RegionServer was blocked while aborting MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122467#comment-14122467 ] stack commented on HBASE-11902: ------------------------------- You mean here: {code} "regionserver60020" prio=10 tid=0x00007f85011ca800 nid=0x74d0 in Object.wait() [0x000000004405f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.hadoop.hbase.util.DrainBarrier.stopAndDrainOps(DrainBarrier.java:115) - locked <0x00000002bb325248> (a org.apache.hadoop.hbase.util.DrainBarrier) at org.apache.hadoop.hbase.util.DrainBarrier.stopAndDrainOps(DrainBarrier.java:85) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.close(FSHLog.java:923) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeWAL(HRegionServer.java:1208) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1001) at java.lang.Thread.run(Thread.java:744) {code} Doesn't seem to be an HDFS issue, just waiting on flushes to complete. You see issues flushing Victor (I've not looked at log). > RegionServer was blocked while aborting > --------------------------------------- > > Key: HBASE-11902 > URL: https://issues.apache.org/jira/browse/HBASE-11902 > Project: HBase > Issue Type: Bug > Components: regionserver, wal > Affects Versions: 0.98.4 > Environment: hbase-0.98.4, hadoop-2.3.0-cdh5.1, jdk1.7 > Reporter: Victor Xu > Attachments: hbase-hadoop-regionserver-hadoop461.cm6.log, jstack_hadoop461.cm6.log > > > Generally, regionserver automatically aborts when isHealth() returns false. But it sometimes got blocked while aborting. I saved the jstack and logs, and found out that it was caused by datanodes failures. The "regionserver60020" thread was blocked while closing WAL. > This issue doesn't happen so frequently, but if it happens, it always leads to huge amount of requests failure. The only way to do is KILL -9. > I think it's a bug, but I haven't found a decent solution. Does anyone have the same problem? -- This message was sent by Atlassian JIRA (v6.3.4#6332)