hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17733) Undo registering regionservers in zk with ephemeral nodes; its more trouble than its worth
Date Sat, 04 Mar 2017 16:52:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895767#comment-15895767

stack commented on HBASE-17733:

[~Apache9] Good point on rolling upgrade. Master would have to keep its ears open for ephemeral
node evaporation for a version or two. We should add a section on HOW to the design doc (smile).

> Undo registering regionservers in zk with ephemeral nodes; its more trouble than its
> ------------------------------------------------------------------------------------------
>                 Key: HBASE-17733
>                 URL: https://issues.apache.org/jira/browse/HBASE-17733
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: stack
> Elsewhere, we are undoing the use of ZK (replication current WAL offset, regions-in-transition,
> I have another case where using ZK, while convenient (call-backs), has holes.
> The scenario is prompted by review of HBASE-9593.
> Currently, a RS registers with the Master by calling the Master's reportForDuty. After
the Master responds with the name we are to use for ourselves (as well as other properties
we need to 'run'), we then turnaround and do a new RPC out to the zk ensemble to register
an ephemeral znode for the RS.
> We notice a RS has gone away -- crashed -- because its znode evaporates and the Master
has a watcher triggered notifying it the RS has gone (after a zk session timeout of tens of
seconds).  Cumbersome (Setting watchers, zk session timeouts) and indirect. Master then trips
the server shutdown handler which does reassign of regions from the crashed server.
> In HBASE-9593, we were trying to handle the rare but possible case where the RS would
die after registering w/ the Master but before we put up our ephemeral znode. In this case
a RS would live in the Master's internals forever because there is no ephemeral znode to expire
to do cleanup and removal of the never-started RS.
> Lets get ZK out of the loop. Then only the Master and RS involved heartbeating each other.

This message was sent by Atlassian JIRA

View raw message