hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-3168) Sanity date and time check when a region server joins the cluster
Date Wed, 10 Nov 2010 00:29:07 GMT

     [ https://issues.apache.org/jira/browse/HBASE-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Gray updated HBASE-3168:
---------------------------------

    Attachment: HBASE-3168-v4.patch

Looks great Jeff.

Made a few small changes...

- Moved pulling maxSkew from config into constructor rather than doing it on each call
- Cleaned up the logging message a bit and changed from DEBUG to WARN
- HRS side, use EnvironmentEdgeManager rather than System.currentTimeMillis directly
- Changes test to operate directly on ServerManager.  I had to do a bit of refactoring of
ServerManager to get this to work and it's nothing something anyone new would have pulled
the trigger on (moving stuff into another class instead of the weird unnecessary coupling
to ServerManager).

Will put up on RB so someone else can review.

> Sanity date and time check when a region server joins the cluster
> -----------------------------------------------------------------
>
>                 Key: HBASE-3168
>                 URL: https://issues.apache.org/jira/browse/HBASE-3168
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.89.20100924
>         Environment: RHEL 5.5 64bit, 1 Master 4 Region Servers
>            Reporter: Jeff Whiting
>            Assignee: Jeff Whiting
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3168-trunk-v1.txt, HBASE-3168-trunk-v2.txt, HBASE-3168-trunk-v3.txt,
HBASE-3168-v4.patch
>
>
> Introduce a sanity check when a RS joins the cluster to make sure its clock isn't too
far out of skew with the rest of the cluster.  If the RS's time is too far out of skew then
the master would prevent it from joining and RS would die and log the error. 
> Having a RS with even small differences in time can cause huge problems due to how bhase
stores values with timestamps.
> According to J-D in ServerManager we are already doing: 
> {code}
>     HServerInfo info = new HServerInfo(serverInfo);
>     checkIsDead(info.getServerName(), "STARTUP");
>     checkAlreadySameHostPort(info);
>     recordNewServer(info, false, null);
> {code}
> And that the new check would fit in nicely there.
> JG suggests we add a "ClockOutOfSync-like exception"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message