cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2759) Scrub could lose increments and replicate that loss
Date Fri, 10 Jun 2011 16:55:59 GMT


Sylvain Lebresne commented on CASSANDRA-2759:

It's picking a new UUID for the current node to use for new counter increment.

The problem is that on a given node we store deltas for it's current nodeId (to avoid synchronized
read-before-write, but I'm starting to wonder is that was the smartest ever). Anyway, if scrub
skips a row, it may skip some of those deltas. Let's say at first there is no increments coming
for this row for A as 'first distinguished replica'. So far we are still kind of good, because
on a read (with CL > ONE) the result coming from A will have a 'version' for it's own sub-count
smaller that the one on the other replica, so we will us the sub-count on those replica and
return the correct value.

However, as soon as A acknowledge new increments for this row, it will start inserting new
deltas while he is not intrinsically up to date. Which will result in an definitive undercount.

The goal of renewing the node id of A is to make sure that second part never happen (because
after the renew A will add new deltas as A', not A anymore).

Anyway, now that I've plugged the brain this patch doesn't really works because A will never
be repaired by the other nodes of it's now inconsistent value.

So I have no clue how to actually fix that.

> Scrub could lose increments and replicate that loss
> ---------------------------------------------------
>                 Key: CASSANDRA-2759
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8.1
>         Attachments: 0001-Renew-nodeId-in-scrub-when-skipping-rows.patch
> If scrub cannot 'repair' a corrupted row, it will skip it. On node A, if the row contains
some sub-count for A id, those will be lost forever since A is the source of truth on it's
current id. We should thus renew node A id when that happens to avoid this (not unlike we
do in cleanup).

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message