hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17287) Master becomes a zombie if filesystem object closes
Date Tue, 28 Mar 2017 18:37:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945696#comment-15945696
] 

Enis Soztutar commented on HBASE-17287:
---------------------------------------

bq. In patch v5, before starting the mini cluster, I set config for master not to host meta
region.
Ok makes sense. I've checked the test again, seems good. The timeout of 60 secs is aggressive
I think though. Let's bump that to 3 mins. In Jenkins things can run super slow causing flakiness.


On the other issue, do we have a problem with the default setup (master hosting meta table
and regionserver) that master abort is not causing the daemon to go down? If we are good there,
+1 for the patch. 

> Master becomes a zombie if filesystem object closes
> ---------------------------------------------------
>
>                 Key: HBASE-17287
>                 URL: https://issues.apache.org/jira/browse/HBASE-17287
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: Clay B.
>            Assignee: Ted Yu
>             Fix For: 1.4.0, 2.0
>
>         Attachments: 17287.branch-1.1.v4.txt, 17287.branch-1.v3.txt, 17287.branch-1.v4.txt,
17287.master.v2.txt, 17287.master.v3.txt, 17287.master.v4.txt, 17287.master.v5.txt, 17287.v2.txt
>
>
> We have seen an issue whereby if the HDFS is unstable and the HBase master's HDFS client
is unable to stabilize before {{dfs.client.failover.max.attempts}} then the master's filesystem
object closes. This seems to result in an HBase master which will continue to run (process
and znode exists) but no meaningful work can be done (e.g. assigning meta).What we saw in
our HBase master logs was:{code}2016-12-01 19:19:08,192 ERROR org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler:
Caught M_META_SERVER_SHUTDOWN, count=1java.io.IOException: failed log splitting for cluster-r5n12.bloomberg.com,60200,1480632863218,
will retryat org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at java.lang.Thread.run(Thread.java:745)Caused
by: java.io.IOException: Filesystem closed{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message