hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Appy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14070) Hybrid Logical Clocks for HBase
Date Fri, 21 Jul 2017 22:07:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096927#comment-16096927
] 

Appy commented on HBASE-14070:
------------------------------

PT = physical time, LT = logical time, ST = system time
----
Note that in current implementation, master and RSs are updating their own clocks on receiving
any region close/open request/response.
Also, on receiving a clock ahead of its own, they update their own clock to its PT+LT, and
keep increasing LT till their own ST catches that PT.
----
Problem 1: cascading logical time increment
When more RS are involved say - 3 RS and 1 master.  Let's say max skew is 30 sec.
HLC Clocks (physical time, logical time):  X = don't care
RS1: (50, 100k)
Master: (40, X)
RS2: (30, X)
RS3: (20, X) 
[RS3's ST behind RS1's by 30 sec.]

RS1 replies to master, sends it's clock (50,X).
Master's clock (50, X).  It'll be another 10 sec before it's own physical clock reaches 50,
so HLC's PT will remain 50 for next 10 sec.
Master --> RS2
RS2's clock = (50, X).
RS2 keeps incrementing LT on writes (since it's own PT is behind) for few seconds before it
replies back to master with (50, X+ few 100k).
Master's clock = (50, X+ few 100k) [Since master's physical clock hasn't caught up yet, note
that it was 10 seconds behind, PT remains 50.].
Master --> RS3
RS3's clock (50, X+few 100k) 
But RS3's ST is behind RS1's ST by 30 sec, which means it'll keep incrementing LT for next
30 sec (unless it gets a newer clock from master).
But the problem is, RS3 has much smaller LT window than actual 1M!!
---
Problem 2:
Single bad RS clock crashing the cluster:
If a single RS's clock is bad and a bit faster, it'll catch time and keep pulling master's
PT with it. If 'real time' is say 20, max skew time is 10, and bad RS is at time 29.9, it'll
pull master to 29.9 (via next response), and then any RS less than 19.9, i.e. just  0.1 sec
away from real time will die due to higher than max skew.
This can bring whole clusters down!
---
Problem 3: Time jumps (not a bug, but more of a nuisance)
Say a RS is behind master by 20 sec. On each communication from master, RS will update its
own PT to master's PT, and it'll remain that till RS's ST catches up. If there are frequent
communication from master, ST might never catch up and RS's PT will actually look like discrete
time jumps rather than continuous time.
For eg. If master communicated with RS at times 30, 40, 50 (RSs corresponding times are 10,
20, 30), than all events on RS between time [10, 50] will be timestamped with either 30, 40
or 50.
---




> Hybrid Logical Clocks for HBase
> -------------------------------
>
>                 Key: HBASE-14070
>                 URL: https://issues.apache.org/jira/browse/HBASE-14070
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Enis Soztutar
>            Assignee: Amit Patel
>         Attachments: HBASE-14070.master.001.patch, HybridLogicalClocksforHBaseandPhoenix.docx,
HybridLogicalClocksforHBaseandPhoenix.pdf
>
>
> HBase and Phoenix uses systems physical clock (PT) to give timestamps to events (read
and writes). This works mostly when the system clock is strictly monotonically increasing
and there is no cross-dependency between servers clocks. However we know that leap seconds,
general clock skew and clock drift are in fact real. 
> This jira proposes using Hybrid Logical Clocks (HLC) as an implementation of hybrid physical
clock + a logical clock. HLC is best of both worlds where it keeps causality relationship
similar to logical clocks, but still is compatible with NTP based physical system clock. HLC
can be represented in 64bits. 
> A design document is attached and also can be found here: 
> https://docs.google.com/document/d/1LL2GAodiYi0waBz5ODGL4LDT4e_bXy8P9h6kWC05Bhw/edit#



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message