hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16367) Race between master and region server initialization may lead to premature server abort
Date Sat, 06 Aug 2016 10:13:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410580#comment-15410580
] 

Ted Yu commented on HBASE-16367:
--------------------------------

Another approach is for master to pass an instance of CountDownLatch to region server.
After master sets cluster Id, it counts down the latch to let region server continue with
initialization.

> Race between master and region server initialization may lead to premature server abort
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-16367
>                 URL: https://issues.apache.org/jira/browse/HBASE-16367
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.1.2
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 16367.v1.txt, 63908-master.log
>
>
> I was troubleshooting a case where hbase (1.1.2) master always dies shortly after start
- see attached master log snippet.
> It turned out that master initialization thread was racing with HRegionServer#preRegistrationInitialization()
(initializeZooKeeper, actually) since HMaster extends HRegionServer.
> Through additional logging in master:
> {code}
>     this.oldLogDir = createInitialFileSystemLayout();
>     HFileSystem.addLocationsOrderInterceptor(conf);
>     LOG.info("creating splitLogManager");
> {code}
> I found that execution didn't reach the last log line before region server declared cluster
Id being null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message