cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schuller (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2405) should expose 'time since last successful repair' for easier aes monitoring
Date Mon, 18 Apr 2011 17:14:07 GMT


Peter Schuller commented on CASSANDRA-2405:

If I'm reading the code correctly, then no I mean earlier. Recall that the reason AES is important
w.r.t. GC grace seconds, is that in order for it to be safe to remove a tombstone for some
piece of data, said piece of data must be guaranteed to have become consistent across the
cluster up to the point of gc grace period start (at the moment of tombstone removal).

That essentially boils down to any write that happened prior to the start of the gc grace
period must have been propagated, whether it be in the form of a hinted hand-off or by 'nodetool
repair'. Since hinted hand-off is only ever an optimization, only nodetool repair is relevant
to maintaining the invariant.

An AES session will only be guaranteed to "see" things that existed in the form of sstables
at the point where it started. This presumably means that AES implies that a memtable flush
happens (if not, it would be broken I think).

So that in turn means that the time to record as 'last successful repair' needs to be before
the flushing of memtables.

It should be noted that of course, for monitoring purposes this isn't about a few milliseconds
here and there. So maybe that's enough to fudge the memtable flushing (although I'm not personally
comfortable with that either); but definitely the time it takes to do the validating compaction
must be counted *after* the millisecond timestamp since that can clearly take a lot of time
(even days for large CF:s).

> should expose 'time since last successful repair' for easier aes monitoring
> ---------------------------------------------------------------------------
>                 Key: CASSANDRA-2405
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Peter Schuller
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>             Fix For: 0.7.5
>         Attachments: CASSANDRA-2405.patch
> The practical implementation issues of actually ensuring repair runs is somewhat of an
undocumented/untreated issue.
> One hopefully low hanging fruit would be to at least expose the time since last successful
repair for a particular column family, to make it easier to write a correct script to monitor
for lack of repair in a non-buggy fashion.

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message