cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sankalp kohli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10727) Solution for getting rid of GC grace seconds
Date Wed, 18 Nov 2015 01:05:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009973#comment-15009973
] 

sankalp kohli commented on CASSANDRA-10727:
-------------------------------------------

How about CASSANDRA-6434? It reduces it down to hint window. 

> Solution for getting rid of GC grace seconds
> --------------------------------------------
>
>                 Key: CASSANDRA-10727
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10727
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sharvanath Pathak
>
> There have been proposals for getting rid of the GC grace seconds, and automating the
GC of tombstones by waiting for acks from all the nodes about the receipt of the tombstone.

> 1. CASSANDRA-3620
> 2. CASSANDRA-6192
> This mechanism has two major benefits in my opinion:
> * Since the GC of tomstones can be much more agressive, it minimizes the number of tombstones
in the system. Thereby, increasing the performance of read operations.
> * Eliminates the possibility of resurrection of keys in case a node is comes up after
being down for more than GC grace seconds.
> As per CASSANDRA-3620, the main issue with the proposal seems to be its potential race
with the hinted handoff. Seems like we can have a good solution to that race. 
> The solution is essentially to record the hint locations. So we before writing any hints,
we write a record on the alive replicas saying a hint was written at so and so node. Now the
GC will wait for an ack from all the replicas, and also for all the related hints to be replayed
and purged before it clears the corresponding tombstone. 
> One potential problem with this scheme is that if the hints are written on the coordinator
node the same way they are being done right now, this process will have to wait for a large
number of nodes to be up before the GC could be performed. However, this can be easily solve
by writing the hints to a node which is determined based on the key token. For instance, write
the hint to the node that comes up next to the replicas in the token ring. 
> Writing the hints in the way described in the last paragraph actually seems like a good
idea anyway, because it minimizes the number of nodes that have to replay hints when a node
comes up. The Dynamo paper actually describes this pattern for hinted handoffs as well. 
> Lastly, it might also have a race with any concurrent read repairs. However, it can be
solved the same way, by writing the repairs in progress for a key and then aborting them before
the GC is performed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message