hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16349) TestClusterId may hang during cluster shutdown
Date Fri, 16 Sep 2016 04:15:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495322#comment-15495322
] 

Ted Yu commented on HBASE-16349:
--------------------------------

In this occurrence https://builds.apache.org/job/HBase-1.4/jdk=JDK_1_8,label=yahoo-not-h2/416/console
:
{code}
"Thread-3" #26 prio=5 os_prio=0 tid=0x00007fe90c01c000 nid=0xdbd in Object.wait() [0x00007fe80d2d3000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1249)
	- locked <0x0000000717dea2e0> (a org.apache.hadoop.hbase.util.JVMClusterUtil$RegionServerThread)
	at org.apache.hadoop.hbase.util.Threads.shutdown(Threads.java:111)
	at org.apache.hadoop.hbase.util.Threads.shutdown(Threads.java:99)
	at org.apache.hadoop.hbase.regionserver.ShutdownHook$ShutdownHookThread.run(ShutdownHook.java:115)
...
"main" #1 prio=5 os_prio=0 tid=0x00007fe9ac009000 nid=0x5e8 in Object.wait() [0x00007fe9b4de1000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1249)
	- locked <0x0000000717dea2e0> (a org.apache.hadoop.hbase.util.JVMClusterUtil$RegionServerThread)
	at java.lang.Thread.join(Thread.java:1323)
	at org.apache.hadoop.hbase.regionserver.TestClusterId.tearDown(TestClusterId.java:65)
{code}
Looks like ShutdownHook was playing a role in the hang.

> TestClusterId may hang during cluster shutdown
> ----------------------------------------------
>
>                 Key: HBASE-16349
>                 URL: https://issues.apache.org/jira/browse/HBASE-16349
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Minor
>         Attachments: 16349.branch-1.v1.txt
>
>
> I was running TestClusterId on branch-1 where I observed the test hang during test tearDown().
> {code}
> 2016-08-03 11:36:39,600 DEBUG [RS_CLOSE_META-cn012:49371-0] regionserver.HRegion(1415):
Closing hbase:meta,,1.1588230740: disabling compactions & flushes
> 2016-08-03 11:36:39,600 DEBUG [RS_CLOSE_META-cn012:49371-0] regionserver.HRegion(1442):
Updates disabled for region hbase:meta,,1.1588230740
> 2016-08-03 11:36:39,600 INFO  [RS_CLOSE_META-cn012:49371-0] regionserver.HRegion(2253):
Flushing 1/1 column families, memstore=232 B
> 2016-08-03 11:36:39,601 WARN  [RS_OPEN_META-cn012:49371-0.append-pool17-t1] wal.FSHLog$RingBufferEventHandler(1900):
Append sequenceId=8, requesting roll of WAL
> java.io.IOException: All datanodes DatanodeInfoWithStorage[127.0.0.1:37765,DS-9870993e-fb98-45fc-b151-708f72aa02d2,DISK]
are bad. Aborting...
>   at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1113)
>   at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:876)
>   at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:402)
> 2016-08-03 11:36:39,602 FATAL [RS_CLOSE_META-cn012:49371-0] regionserver.HRegionServer(2085):
ABORTING region server cn012.l42scl.hortonworks.com,49371,1470249187586: Unrecoverable   
   exception while closing region hbase:meta,,1.1588230740, still finishing close
> org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Append sequenceId=8, requesting
roll of WAL
>   at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1902)
>   at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1754)
>   at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1676)
>   at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[127.0.0.1:37765,DS-9870993e-fb98-45fc-b151-708f72aa02d2,DISK]
are bad. Aborting...
>   at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1113)
>   at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:876)
>   at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:402)
> 2016-08-03 11:36:39,603 FATAL [RS_CLOSE_META-cn012:49371-0] regionserver.HRegionServer(2093):
RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.       
   MultiRowMutationEndpoint]
> {code}
> This led to rst.join() hanging:
> {code}
> "RS:0;cn012:49371" #648 prio=5 os_prio=0 tid=0x00007fdab24b5000 nid=0x621a waiting on
condition [0x00007fd785fe0000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at org.apache.hadoop.hbase.regionserver.HRegionServer.sleep(HRegionServer.java:1326)
>   at org.apache.hadoop.hbase.regionserver.HRegionServer.waitOnAllRegionsToClose(HRegionServer.java:1312)
>   at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1082)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message