hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16367) Race between master and region server initialization may lead to premature server abort
Date Tue, 09 Jan 2018 19:25:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318998#comment-16318998
] 

stack commented on HBASE-16367:
-------------------------------

This patch doesn't work. See HBASE-19694. No doc on what the latch is about, what it is supposed
to be holding up. Digging, the order of events seems same but this latch seems super fragile,
susceptible to break if any reordering done. No test to guard against change. Let me try and
revert this thing over in HBASE-19694.

> Race between master and region server initialization may lead to premature server abort
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-16367
>                 URL: https://issues.apache.org/jira/browse/HBASE-16367
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.1.2
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>             Fix For: 2.0.0, 1.4.0
>
>         Attachments: 16367.addendum, 16367.v1.txt, 16367.v2.txt, 16367.v3.txt, 63908-master.log
>
>
> I was troubleshooting a case where hbase (1.1.2) master always dies shortly after start
- see attached master log snippet.
> It turned out that master initialization thread was racing with HRegionServer#preRegistrationInitialization()
(initializeZooKeeper, actually) since HMaster extends HRegionServer.
> Through additional logging in master:
> {code}
>     this.oldLogDir = createInitialFileSystemLayout();
>     HFileSystem.addLocationsOrderInterceptor(conf);
>     LOG.info("creating splitLogManager");
> {code}
> I found that execution didn't reach the last log line before region server declared cluster
Id being null.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message