cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2405) should expose 'time since last successful repair' for easier aes monitoring
Date Wed, 15 Jun 2011 13:50:48 GMT


Sylvain Lebresne commented on CASSANDRA-2405:

The problem with using the completion time as the (Super)Column name is that you have to wait
the end of the repair to store anything. First, this will not capture started but failed session
(which while not mandatory could be nice, especially as soon as we will start keeping a bit
more info this could help troubleshooting). And Second, it will be a pain to have to keep
some of the information until the end (the processingStartedAt is a first sign of this). And
third, we may want to keep some info on say merkle tree creation on all replica participating
in the repair, even though we only store the completed time on the node initiating the repair.

So I would propose to something like:
  row key: KS/CF
  super column name: repair session name (a TimeUUID)
  columns: the infos on the session (range, start and end time, number of range repaired,
bytes transferred, ...)

That is roughly the same thing as you propose but with super column name being the repair
session name.

Now, because the repair session names are TimeUUID (well, right now it is a sting including
a UUID, we can change it to a simple TimeUUID easily), the session will be ordered by creation
time. So getting the last successful repair is probably not too hard: just grab the last 1000
created sessions and find the last successful one.
And if we want, we can even use another specific "index" row that associate 'completion time'
-> 'session UUID' (and thanks to the new DynamicCompositeType we can have some rows ordered
by TimeUUIDType and some other ordered by LongType without the need of multiple system table).

> should expose 'time since last successful repair' for easier aes monitoring
> ---------------------------------------------------------------------------
>                 Key: CASSANDRA-2405
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>             Fix For: 0.8.1
>         Attachments: CASSANDRA-2405-v2.patch, CASSANDRA-2405-v3.patch, CASSANDRA-2405.patch
> The practical implementation issues of actually ensuring repair runs is somewhat of an
undocumented/untreated issue.
> One hopefully low hanging fruit would be to at least expose the time since last successful
repair for a particular column family, to make it easier to write a correct script to monitor
for lack of repair in a non-buggy fashion.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message