hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
Date Wed, 26 Aug 2015 18:54:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715328#comment-14715328

stack commented on HBASE-14317:

Is the concurrent shutting of regions which are waiting on safe point:

"RS_CLOSE_REGION-r12s16:9104-1" #33639 prio=5 os_prio=0 tid=0x00007fbf546fe000 nid=0x563 in
Object.wait() [0x00007fbf38107000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:502)
	at org.apache.hadoop.hbase.regionserver.HRegion.waitForFlushesAndCompactions(HRegion.java:1512)
	- locked <0x000000056baa4888> (a org.apache.hadoop.hbase.regionserver.HRegion$WriteState)
	at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1371)
	- locked <0x000000056baa4888> (a org.apache.hadoop.hbase.regionserver.HRegion$WriteState)
	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1336)
	- locked <0x000000056baaf928> (a java.lang.Object)
	at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

... and then the FATAL roll of logs happening at same time the issue? Dig.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -----------------------------------------------------
>                 Key: HBASE-14317
>                 URL: https://issues.apache.org/jira/browse/HBASE-14317
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.1.1
>            Reporter: stack
>         Attachments: [Java] RS stuck on WAL sync to a dead DN - Pastebin.com.html, raw.php,
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. See attached
thread dump and associated log. What is interesting is that syncers are waiting to take syncs
to run and at same time we want to flush so we are waiting on a safe point but there seems
to be nothing in our ring buffer; did we go to roll log and not add safe point sync to clear
out ringbuffer?
> Needs a bit of study. Try to reproduce.

This message was sent by Atlassian JIRA

View raw message