cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From horschi <>
Subject Re: repair, compaction, and tombstone rows
Date Mon, 05 Nov 2012 13:11:50 GMT
> - ... ExpiringColumn not create any tombstones? Imo this could be safely
> > done if the columns TTL is >= gcgrace.
> Yes, if the TTL >= gcgrace this would be safe and I'm pretty sure we
> use to have a ticket for that (can't find it back with a quick search
> but JIRA search suck and I didn't bother long). But basically we
> decided to not do it for now for 2 reasons:...
The only ticket I found that was anything similar is CASSANDRA-4565. I have
my doubts that you meant that one :-)

I dont know what your approach was back then, but maybe it could be solved
quite easily: When creating tombstones for ExpiringColumns, we could use
the ExpiringColumn.timestamp to set the DeletedColumn.localDeletionTime .
So instead of using the deletiontime of the ExpiringColumn, we use the

In the ExpiringColumn class this would like this:

public static Column create(ByteBuffer name, ByteBuffer value, long
timestamp, int timeToLive, int localExpirationTime, int expireBefore,
IColumnSerializer.Flag flag)
    if (localExpirationTime >= expireBefore || flag ==
        return new ExpiringColumn(name, value, timestamp, timeToLive,
    // the column is now expired, we can safely return a simple tombstone
    return new DeletedColumn(name, *timestamp/1000*, timestamp); // uses
creation timestamp for ExpiringColumn
    // return new DeletedColumn(name, localExpirationTime, timestamp); //
old code

Imo this makes tombstones of DeletedColumns live only as long as they need
to be:
In case you specify ExpireColumn.TTL > 10days, then the created
DeletedColumn would have a timestamp thats >10days in the past, which makes
it obsolete for gc right away. With ttl=5days the tombstone stays for 5
days, enough for either the ExpiringColumn or the Tombstone to be repaired.

> > - ... ExpiringColumn not add local timestamp to digest?
> As I said in a previous thread, I don't see what the problem is here.
> The timestamp is not local to the node, it is assigned once and for
> all by the coordinator at insert time. I can agree that it's not
> really useful per se to the digest, but I don't think it matters in
> any case.
Oh sorry, you're right, I mixed something up there. Its DeletedColumn that
has the localtimestamp (as value). It takes a localDeletionTime (which is
supplied by RowMutation.delete) and uses that a value for the
DeletedColumn. This value is used by Column to update the digest.

Sorry for not letting this go, but I think there are some low hanging
fruits here.


View raw message