hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11902) RegionServer was blocked while aborting
Date Fri, 05 Sep 2014 05:36:26 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122467#comment-14122467

stack commented on HBASE-11902:

You mean here:

"regionserver60020" prio=10 tid=0x00007f85011ca800 nid=0x74d0 in Object.wait() [0x000000004405f000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:503)
	at org.apache.hadoop.hbase.util.DrainBarrier.stopAndDrainOps(DrainBarrier.java:115)
	- locked <0x00000002bb325248> (a org.apache.hadoop.hbase.util.DrainBarrier)
	at org.apache.hadoop.hbase.util.DrainBarrier.stopAndDrainOps(DrainBarrier.java:85)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog.close(FSHLog.java:923)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.closeWAL(HRegionServer.java:1208)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1001)
	at java.lang.Thread.run(Thread.java:744)

Doesn't seem to be an HDFS issue, just waiting on flushes to complete.  You see issues flushing
Victor (I've not looked at log).

> RegionServer was blocked while aborting
> ---------------------------------------
>                 Key: HBASE-11902
>                 URL: https://issues.apache.org/jira/browse/HBASE-11902
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, wal
>    Affects Versions: 0.98.4
>         Environment: hbase-0.98.4, hadoop-2.3.0-cdh5.1, jdk1.7
>            Reporter: Victor Xu
>         Attachments: hbase-hadoop-regionserver-hadoop461.cm6.log, jstack_hadoop461.cm6.log
> Generally, regionserver automatically aborts when isHealth() returns false. But it sometimes
got blocked while aborting. I saved the jstack and logs, and found out that it was caused
by datanodes failures. The "regionserver60020" thread was blocked while closing WAL. 
> This issue doesn't happen so frequently, but if it happens, it always leads to huge amount
of requests failure. The only way to do is KILL -9.
> I think it's a bug, but I haven't found a decent solution. Does anyone have the same

This message was sent by Atlassian JIRA

View raw message