cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josef Lindman Hörnlund <jo...@appdata.biz>
Subject Re: Reconciling expiring cells and tombstones
Date Wed, 17 Jun 2015 15:05:17 GMT

Hello Sam,

This is not answering your direct question but if you worry about clock skew take a look at
this great two-part blogpost:

https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/
<https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/>
https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2-solutions/
<https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2-solutions/>


Josef Lindman Hörnlund
Chief Data Scientist
AppData
josef@appdata.biz




> On 16 Jun 2015, at 20:45, Sam Klock <sklock@akamai.com> wrote:
> 
> Hi folks,
> 
> I have a question about a design choice on how expiring cells are
> reconciled with tombstones.  For two cells with the same timestamp, if
> one is expiring and one is a tombstone, Cassandra *always* prefers the
> tombstone.  This matches its behavior for normal/non-expiring cells, but
> the folks in my organization worry about what it may imply for nodes
> experiencing clock skew.  Specifically, we're concerned about scenarios
> like the following:
> 
> 1) An expiring cell is committed via some node with a non-skewed clock.
> 2) Another replica for that cell experiences forward clock skew and
> decides that the cell is expired.  It eventually runs a compaction that
> converts the cell to a tombstone.
> 3) The tombstone propagates to other nodes via, e.g., node repair.
> 4) The other nodes all eventually run their own compactions.  Because of
> the reconciliation logic, the expiring cell is purged on all of the
> replicas, leaving behind only the tombstone.
> 
> If the cell should have still been live at (4), the reconciliation logic
> will result in it being prematurely purged.  We have confirmed this
> behavior experimentally.
> 
> My organization may be more concerned about clock skew than the larger
> community, so I don't think we're inclined to propose a patch at this
> time.  But to account for this kind of scenario we would like to patch
> our internal version of Cassandra to conditionally prefer expiring cells
> to tombstones if the node believes they should still be live; i.e., in
> reconcile() in *ExpiringCell.java, instead of:
> 
>        if (cell instanceof DeletedCell)
>            return cell;
> 
> use:
> 
>        if (cell instanceof DeletedCell)
>            return isLive() ? this : cell;
> 
> Before we do so, however, we'd like to understand the rationale for the
> existing behavior and the risks of making changes to it.  Why does
> Cassandra consistently prefer tombstones to other kinds of cells?  By
> modifying this behavior in this particular case, do we risk hitting
> bizarre corner cases?
> 
> Thanks,
> SK


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message