cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T Jake Luciani (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-11547) Add background thread to check for clock drift
Date Fri, 22 Apr 2016 13:36:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253933#comment-15253933
] 

T Jake Luciani edited comment on CASSANDRA-11547 at 4/22/16 1:35 PM:
---------------------------------------------------------------------

I don't see how sleeping and waking up can possibly work reliably in conjunction with GC.
The idea od jHiccup is to detect latency hits not clock drift.
Stress for example has this notion of [timing | https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/tools/stress/src/org/apache/cassandra/stress/util/Timing.java]
which does account for GC but only in terms of data collected for an interval not actual clock
values.

I like the approach of 6680 to do this.  It's about the relative clock drift compared to other
nodes.  Riemann for example uses [this approach| http://riemann.io/api/riemann.streams.html#var-clock-skew]
which worked well for me in cases of ntpd going out on particular hosts.

I don't think we can reliably detect minor differences (10s of millis) but large ones (100s
of millis - seconds)


was (Author: tjake):
I don't see how sleeping and waking up can possibly work in conjunction with GC. The idea
id jHiccup is to detect latency hits not clock drift.
Stress for example has this notion of [timing | https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/tools/stress/src/org/apache/cassandra/stress/util/Timing.java]
which does account for GC but only in terms of data collected for an interval not actual clock
values.

I like the approach of 6680 to do this.  It's about the relative clock drift compared to other
nodes.  Riemann for example uses [this approach| http://riemann.io/api/riemann.streams.html#var-clock-skew]
which worked well for me in cases of ntpd going out on particular hosts.

I don't think we can reliably detect minor differences (10s of millis) but large ones (100s
of millis - seconds)

> Add background thread to check for clock drift
> ----------------------------------------------
>
>                 Key: CASSANDRA-11547
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a simple way
to check if this occurs, we can run a background thread that wakes up every n seconds, reads
the system clock, and checks to see if, indeed, n seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n seconds
in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or has moved
backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within an acceptable
window of clock movement. Reasons for including an offset are the clock checking thread might
not have been scheduled on time, or garbage collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume the clock
jumped forward.
> In the unhappy cases, we can write a message to the log and increment some metric that
the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message