hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-10271) [regression] Cannot use the wildcard address since HBASE-9593
Date Fri, 03 Jan 2014 01:49:50 GMT

     [ https://issues.apache.org/jira/browse/HBASE-10271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jean-Daniel Cryans updated HBASE-10271:
---------------------------------------

    Attachment: HBASE-10271.patch

Attaching a simple patch that I just put together quickly. I tested that it basically works.
It contains:

 - A partial revert of HBASE-9593, just the part that touches HRS
 - A new property in ServerLoad that keeps track of the last heartbeat, set on the master
side (I added getEmptyServerload because newly registered RS would have EMPTY_SERVERLOAD's
old time)
 - A new thread in SM that monitors the last heartbeat

I haven't fixed 9593's unit test yet.

> [regression] Cannot use the wildcard address since HBASE-9593
> -------------------------------------------------------------
>
>                 Key: HBASE-10271
>                 URL: https://issues.apache.org/jira/browse/HBASE-10271
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.0, 0.94.13, 0.96.1
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0
>
>         Attachments: HBASE-10271.patch
>
>
> HBASE-9593 moved the creation of the ephemeral znode earlier in the region server startup
process such that we don't have access to the ServerName from the Master's POV. HRS.getMyEphemeralNodePath()
calls HRS.getServerName() which at that point will return this.isa.getHostName(). If you set
hbase.regionserver.ipc.address to 0.0.0.0, you will create a znode with that address.
> What happens next is that the RS will report for duty correctly but the master will do
this:
> {noformat}
> 2014-01-02 11:45:49,498 INFO  [master:172.21.3.117:60000] master.ServerManager: Registering
server=0:0:0:0:0:0:0:0%0,60020,1388691892014
> 2014-01-02 11:45:49,498 INFO  [master:172.21.3.117:60000] master.HMaster: Registered
server found up in zk but who has not yet reported in: 0:0:0:0:0:0:0:0%0,60020,1388691892014
> {noformat}
> The cluster is then unusable.
> I think a better solution is to track the heartbeats for the region servers and expire
those that haven't checked-in for some time. The 0.89-fb branch has this concept, and they
also use it to detect rack failures: https://github.com/apache/hbase/blob/0.89-fb/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java#L1224.
In this jira's scope I would just add the heartbeat tracking and add a unit test for the wildcard
address.
> What do you think [~rajesh23]?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message