cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-5667) Change timestamps used in CAS ballot proposals to be more resilient to clock skew
Date Sun, 30 Jun 2013 03:28:20 GMT


Jonathan Ellis updated CASSANDRA-5667:

    Attachment: 5667.txt

Patch attached to move contention retry into {{beginAndRepairPaxos}} and use max(current time
from system clock, inProgress + 1) as the ballot.

Also updates in_progress_ballot on commit if necessary to preserve the guarantee that we won't
issue a promise for any ballot less than we've seen before.
> Change timestamps used in CAS ballot proposals to be more resilient to clock skew
> ---------------------------------------------------------------------------------
>                 Key: CASSANDRA-5667
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 2.0 beta 1
>         Environment: n/a
>            Reporter: Nick Puz
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 2.0 beta 1
>         Attachments: 5667.txt
> The current time is used to generate the timeuuid used for CAS ballots proposals with
the logic that if a newer proposal exists then the current one needs to complete that and
re-propose. The problem is that if a machine has clock skew and drifts into the future it
will propose with a large timestamp (which will get accepted) but then subsequent proposals
with lower (but correct) timestamps will not be able to proceed. This will prevent CAS write
operations and also reads at serializable consistency level. 
> The work around is to initially propose with current time (current behavior) but if the
proposal fails due to a larger existing one re-propose (after completing the existing if necessary)
with the max of (currentTime, mostRecent+1, proposed+1).
> Since small drift is normal between different nodes in the same datacenter this can happen
even if NTP is working properly and a write hits one node and a subsequent serialized read
hits another. In the case of NTP config issues (or OS bugs with time esp around DST) the unavailability
window could be much larger.  

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message