cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6106) QueryState.getTimestamp() & FBUtilities.timestampMicros() reads current timestamp with System.currentTimeMillis() * 1000 instead of System.nanoTime() / 1000
Date Tue, 01 Apr 2014 19:36:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956938#comment-13956938
] 

Benedict edited comment on CASSANDRA-6106 at 4/1/14 7:34 PM:
-------------------------------------------------------------

It doesn't look safe to me to simply grab gtod.wall_time_sec anyway, even if we could find
its location, as the nanos value gets repaired after reading with another call. We could investigate
further, but for the time being I have a reasonably straightforward solution [here|http://github.com/belliottsmith/cassandra/tree/6106-microstime]

I started by simply calling the rt clock_gettime method through JNA, which unfortunately clocks
in at a heavy 7 micros; since nanoTime and currentTimeMillis are < 0.03 micros, this seemed
a little unacceptable. So what I've done is opted to periodically (once per second) grab the
latest micros time via the best method possible (clock_gettime if available, currentTimeMillis
* 1000 otherwise) and use this to reset the offset, however to ensure we have a smooth transition
I:

# Cap the rate of  change at 50ms per second
# Ensure it never leaps back in time, at least on any given thread (no way to guarantee stronger
than this)
# Only apply a change if it is at least 1ms out, to avoid noise (possibly should tighten this
to 100 micros, or dependent on resolution of time library we're using)

The result is a method that costs around the same as a raw call to System.nanoTime() but gives
pretty decent accuracy. Obviously any method that involves using nanos and calculating an
offset from a method that takes ~7micros to return is going to have an inherent inaccuracy,
but no more than the 7micros direct method call would itself, and the inaccuracy will be consistent
given the jitter reduction I'm applying. At startup we also sample the offset 10k times, derive
a 90%ile for elapsed time fetching the offset (we ignore future offsets we calculate that
take more than twice this period to sample) and average all of those within the 90%ile.




was (Author: benedict):
It doesn't look safe to me to simply grab gtod.wall_time_sec anyway, even if we could find
its location, as the nanos value gets repaired after reading with another call. We could investigate
further, but for the time being I have a reasonably straightforward solution [here|github.com/belliottsmith/cassandra/tree/6106-microstime]

I started by simply calling the rt clock_gettime method through JNA, which unfortunately clocks
in at a heavy 7 micros; since nanoTime and currentTimeMillis are < 0.03 micros, this seemed
a little unacceptable. So what I've done is opted to periodically (once per second) grab the
latest micros time via the best method possible (clock_gettime if available, currentTimeMillis
* 1000 otherwise) and use this to reset the offset, however to ensure we have a smooth transition
I:

# Cap the rate of  change at 50ms per second
# Ensure it never leaps back in time, at least on any given thread (no way to guarantee stronger
than this)
# Only apply a change if it is at least 1ms out, to avoid noise (possibly should tighten this
to 100 micros, or dependent on resolution of time library we're using)

The result is a method that costs around the same as a raw call to System.nanoTime() but gives
pretty decent accuracy. Obviously any method that involves using nanos and calculating an
offset from a method that takes ~7micros to return is going to have an inherent inaccuracy,
but no more than the 7micros direct method call would itself, and the inaccuracy will be consistent
given the jitter reduction I'm applying. At startup we also sample the offset 10k times, derive
a 90%ile for elapsed time fetching the offset (we ignore future offsets we calculate that
take more than twice this period to sample) and average all of those within the 90%ile.



> QueryState.getTimestamp() & FBUtilities.timestampMicros() reads current timestamp
with System.currentTimeMillis() * 1000 instead of System.nanoTime() / 1000
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6106
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6106
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: DSE Cassandra 3.1, but also HEAD
>            Reporter: Christopher Smith
>            Assignee: Benedict
>            Priority: Minor
>              Labels: timestamps
>             Fix For: 2.1 beta2
>
>         Attachments: microtimstamp.patch, microtimstamp_random.patch, microtimstamp_random_rev2.patch
>
>
> I noticed this blog post: http://aphyr.com/posts/294-call-me-maybe-cassandra mentioned
issues with millisecond rounding in timestamps and was able to reproduce the issue. If I specify
a timestamp in a mutating query, I get microsecond precision, but if I don't, I get timestamps
rounded to the nearest millisecond, at least for my first query on a given connection, which
substantially increases the possibilities of collision.
> I believe I found the offending code, though I am by no means sure this is comprehensive.
I think we probably need a fairly comprehensive replacement of all uses of System.currentTimeMillis()
with System.nanoTime().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message