hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-17287) Master becomes a zombie if filesystem object closes
Date Mon, 27 Mar 2017 20:31:41 GMT

     [ https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ted Yu updated HBASE-17287:
    Attachment: 17287.master.v4.txt

Added testSafemodeBringsDownMaster in patch v4 - planning to create separate test class once
the new test passes.
Currently the wait for master thread to exit times out:
 Time elapsed: 61.538 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 60000 milliseconds
	at java.lang.Thread.sleep(Native Method)
	at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:196)
	at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:143)
	at org.apache.hadoop.hbase.HBaseTestingUtility.waitFor(HBaseTestingUtility.java:3959)
	at org.apache.hadoop.hbase.master.procedure.TestCreateTableProcedure.testSafemodeBringsDownMaster(TestCreateTableProcedure.java:92)
Let me see what the cause could be.

> Master becomes a zombie if filesystem object closes
> ---------------------------------------------------
>                 Key: HBASE-17287
>                 URL: https://issues.apache.org/jira/browse/HBASE-17287
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: Clay B.
>            Assignee: Ted Yu
>             Fix For: 1.4.0, 2.0
>         Attachments: 17287.branch-1.v3.txt, 17287.master.v2.txt, 17287.master.v3.txt,
17287.master.v4.txt, 17287.v2.txt
> We have seen an issue whereby if the HDFS is unstable and the HBase master's HDFS client
is unable to stabilize before {{dfs.client.failover.max.attempts}} then the master's filesystem
object closes. This seems to result in an HBase master which will continue to run (process
and znode exists) but no meaningful work can be done (e.g. assigning meta).What we saw in
our HBase master logs was:{code}2016-12-01 19:19:08,192 ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler:
Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log splitting for cluster-r5n12.bloomberg.com,60200,1480632863218,
will retryat org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at java.lang.Thread.run(Thread.java:745)Caused
by: java.io.IOException: Filesystem closed{code}

This message was sent by Atlassian JIRA

View raw message