cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2788) Add startup option renew the NodeId (for counters)
Date Fri, 17 Jun 2011 16:12:47 GMT


Jonathan Ellis commented on CASSANDRA-2788:

Pasting Sylvain's explanation from IRC:

Let's me take a small example: Suppose two node A and B. Initially their node_id will be respectively
A1 and B1. Each counter will thus have two components, A1 and B1.

Now suppose you renew the node_id of A -> A2 because of a corruption. Soon enough, the
counters will have 3 components A1, A2 and B1. Renew that yet another time and the counter
context will be A1, A2, A3 and B1. It grows, which is not cool.
But because we know that nobody will ever increment A1 and A2 anymore (A3 is the active node
id for A), we can merge them (we have to wait for gc_grace and stuff for that be correct etc...
but we do it)

So basically we try to keep the context as small as can be. If you nuke NodeIdInfo, right
now the code won't be able to do that anymore and you will stay with a bigger that necessary
context for all the counters.

So just renewing is more efficient in that sense. But nuking the system table is still 'correct'
as far as returning the correct count is involved.

> Add startup option renew the NodeId (for counters)
> --------------------------------------------------
>                 Key: CASSANDRA-2788
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>              Labels: counters
>             Fix For: 0.8.2
>         Attachments: 0001-Option-to-renew-the-NodeId-on-startup.patch
> If an sstable of a counter column family is corrupted, the only safe solution a user
have right now is to:
> # Remove the NodeId System table to force the node to regenerate a new NodeId (and thus
stop incrementing on it's previous, corrupted, subcount)
> # Remove all the sstables for that column family on that node (this is important because
otherwise the node will never get "repaired" for it's previous subcount)
> This is far from being ideal, but I think this is the price we pay for avoiding the read-before-write.
In any case, the first step (remove the NodeId system table) happens to remove the list of
the old NodeId this node has, which could prevent us for merging the other potential previous
nodeId. This is ok but sub-optimal. This ticket proposes to add a new startup flag to make
the node renew it's NodeId, thus replacing this first.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message