cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josef Lindman Hörnlund <>
Subject Re: Reconciling expiring cells and tombstones
Date Wed, 17 Jun 2015 15:05:17 GMT

Hello Sam,

This is not answering your direct question but if you worry about clock skew take a look at
this great two-part blogpost:

Josef Lindman Hörnlund
Chief Data Scientist

> On 16 Jun 2015, at 20:45, Sam Klock <> wrote:
> Hi folks,
> I have a question about a design choice on how expiring cells are
> reconciled with tombstones.  For two cells with the same timestamp, if
> one is expiring and one is a tombstone, Cassandra *always* prefers the
> tombstone.  This matches its behavior for normal/non-expiring cells, but
> the folks in my organization worry about what it may imply for nodes
> experiencing clock skew.  Specifically, we're concerned about scenarios
> like the following:
> 1) An expiring cell is committed via some node with a non-skewed clock.
> 2) Another replica for that cell experiences forward clock skew and
> decides that the cell is expired.  It eventually runs a compaction that
> converts the cell to a tombstone.
> 3) The tombstone propagates to other nodes via, e.g., node repair.
> 4) The other nodes all eventually run their own compactions.  Because of
> the reconciliation logic, the expiring cell is purged on all of the
> replicas, leaving behind only the tombstone.
> If the cell should have still been live at (4), the reconciliation logic
> will result in it being prematurely purged.  We have confirmed this
> behavior experimentally.
> My organization may be more concerned about clock skew than the larger
> community, so I don't think we're inclined to propose a patch at this
> time.  But to account for this kind of scenario we would like to patch
> our internal version of Cassandra to conditionally prefer expiring cells
> to tombstones if the node believes they should still be live; i.e., in
> reconcile() in *, instead of:
>        if (cell instanceof DeletedCell)
>            return cell;
> use:
>        if (cell instanceof DeletedCell)
>            return isLive() ? this : cell;
> Before we do so, however, we'd like to understand the rationale for the
> existing behavior and the risks of making changes to it.  Why does
> Cassandra consistently prefer tombstones to other kinds of cells?  By
> modifying this behavior in this particular case, do we risk hitting
> bizarre corner cases?
> Thanks,
> SK

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message