hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3168) Sanity date and time check when a region server joins the cluster
Date Thu, 04 Nov 2010 04:27:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928117#action_12928117

stack commented on HBASE-3168:


#1 could be a legitimate problem in case where regionserver came up but there was no master
to connect too so regionserver just hung out twiddling its thumbs for five or ten minutes.

#2 is not an issue.  You say "If each region server then calls reportsForDuty...".  Thats
not what happens.  A regionserver when it comes up calls reportForDuty/regionServerStartup.
 Thereafter, it heartbeats by calling regionServerReport (until it dies).  When a master joins
an already running cluster, the regionservers will just call the new masters' regionServerReport
- not the initializing regionServerStartup -- and the master just registers the regionserver
at that time (TODO: do away with regionServerStartup or when a new master joins cluster, have
regionserver call regionServerStartup rather than regionServerReport.  In interests of simplicity,
it doesn't seem as though regionServerStartup is no longer necessary so we should just axe

I like Jon's suggestion of changing the signature on reportsForDuty to add regionServerCurrentTimeMillis

You might argue that regionServerReport should be modified too to also take the regionserver
timestamp but thats probably overdoing it.

Thanks for working on this.

> Sanity date and time check when a region server joins the cluster
> -----------------------------------------------------------------
>                 Key: HBASE-3168
>                 URL: https://issues.apache.org/jira/browse/HBASE-3168
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.89.20100924
>         Environment: RHEL 5.5 64bit, 1 Master 4 Region Servers
>            Reporter: Jeff Whiting
>             Fix For: 0.90.0
>         Attachments: HBASE-3168-trunk-v1.txt
> Introduce a sanity check when a RS joins the cluster to make sure its clock isn't too
far out of skew with the rest of the cluster.  If the RS's time is too far out of skew then
the master would prevent it from joining and RS would die and log the error. 
> Having a RS with even small differences in time can cause huge problems due to how bhase
stores values with timestamps.
> According to J-D in ServerManager we are already doing: 
> {code}
>     HServerInfo info = new HServerInfo(serverInfo);
>     checkIsDead(info.getServerName(), "STARTUP");
>     checkAlreadySameHostPort(info);
>     recordNewServer(info, false, null);
> {code}
> And that the new check would fit in nicely there.
> JG suggests we add a "ClockOutOfSync-like exception"

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message