cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Branson (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5509) Decouple Consistency & Durability
Date Wed, 24 Apr 2013 14:01:17 GMT


Rick Branson commented on CASSANDRA-5509:

Doesn't HH >=1.0 store one hint per failed mutation on the coordinator? My understanding
is that if mutations fail against two replicas, the coordinator stores two hints that each
hold the complete mutation data. The proposed idea would store the same amount of data, except
that the coordinator would attempt to distribute the hints among two non-replicas.
> Decouple Consistency & Durability
> ---------------------------------
>                 Key: CASSANDRA-5509
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Rick Branson
> Right now in Cassandra, consistency and durability are intertwined in a way that is unnecessary.
In environments where nodes have unreliable local storage, the consistency level of writes
must be increased to N+1 to ensure that N host failure(s) don't cause data loss, even if it's
acceptable that consistency is weaker. The probability of data loss is also heavily influenced
by entropy. An example is if the client chooses a replica as the write coordinator for a CL.ONE
write, the risk of losing that data increases substantially. During a node outage, the chance
of data loss is elevated for a relatively long time: the entire length of the node outage
+ recovery time. The required increase in consistency level has real impact: it creates the
potential for availability issues during routine maintenance as an unlucky node failure can
cause writes to start failing. It's also generally considered a best practice that each datacenter
has at least 3 replicas of data, even if quorums for consistency are not required, as it's
the only way to ensure strong durability in the face of transient inter-DC failures.
> I found a relevant paper that provides some theoretical grounding while researching:
> I'd like to propose that in the event of a down replica, the coordinator attempts to
achieve RF by distributing "remote hints" to RF-liveReplicaCount non-replica nodes. If the
coordinator itself is a non-replica, it would be an acceptable choice for a remote hint as
well. This would achieve RF level durability without the availability penalty of increasing
consistency. This would also allow decreasing the (global) RF, as RF durability goals could
still be achieved during transient inter-DC failures, requiring just RF nodes in each DC,
instead of RF replicas in each DC. Even better would be if the selection of remote hint nodes
respected the replication strategy and was able to achieve the cross-rack / cross-DC durability.
> While ConsistencyLevel is a pretty overloaded concept at this point, and I think it'd
be great to add a DurabilityLevel to each write, I understand that this is likely not pragmatic.
Therefore, considering that the CL.TWO and CL.THREE options were added largely for durability
reasons, I propose that they be repurposed to support durability goals and remote hinting.
They would require 1 replica ACK and CL-1 (replica|hint) ACKs. It also might be desirable
to extend the "ANY" option to require multiple hint ACKs, such as CL.ANY_TWO or CL.ANY_THREE,
which would support combined very high durability and very high availability. All CLs will
benefit as remote hinting vastly tightens the window of elevated data loss chance during a
node outage from nodeOutageDuration + recoveryDuration to the time it takes for the coordinator
to distribute remote hints.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message