incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Doubleday <>
Subject Re: repair question
Date Mon, 23 May 2011 18:18:58 GMT
Thanks Sylvain

well no I don't really understand it at all. We have all 

Wide rows / small val to single larger column in one row.

The problem hits every CF. RF = 3 Read / Write with Quorum. 

The CF that is killing me right now is one col thats never updated (its WORM - updates are
reinserts under a new key and a delete of the old one - to avoid updates of large CF). 250GB
per node.
Unfortunately restarting the node doesn't stop repair so the repair started again. I deleted
all tmp files before restarting but its out of space again. du -hcs shows 780GB for that CF

Guess I have to restart all nodes to stop repair?

To answer the question: yes the cluster might be a little out of synch but not that much.

What I dont understand: I saw that the repairing node was still doing a validation compaction
on that major sstable file (200GB) but it already received loads of data for that CF from
the other nodes.


On May 23, 2011, at 7:48 PM, Sylvain Lebresne wrote:

> On Mon, May 23, 2011 at 7:17 PM, Daniel Doubleday
> <> wrote:
>> Hi all
>> I'm a bit lost: I tried a repair yesterday with only one CF and that didn't really
work the way I expected but I thought that would be a bug which only affects that special
>> So I tried again for all CFs.
>> I started with a nicely compacted machine with around 320GB of load. Total disc space
on this node was 1.1TB.
>> After it went out of disc space (meaning I received around 700GB of data) I had a
very brief look at the repair code again and it seems to me that the repairing node will get
all data for its range from all its neighbors.
> The repaired node is supposed to get only data from it's
> neighbors for rows it is not in sync with. That is all supposed
> to depend on how much the node is out of sync compared to
> the other nodes.
> Now there is a number of things that could make it repair more
> that what you would hope. For instance:
>  1) even if one column is different for a row, the full row is
>      repaired. If you have a small number of huge rows, that
>      can amount for quite some data useless transfered.
>  2) The other one is that the merkle tree (that allows to say
>      whether 2 rows are in sync) doesn't necessarily have one
>      hash by row, so in theory one column not in sync may imply
>      the repair of more than one row.
>  3) (which
>      is fixed in 0.8)
> Fortunately, the chance to get hit by 1) is proportionally inverse
> to the change of getting hit by 2) and vice versa.
> Anyway, the kind of excess data your seeing is not something
> I would expect unless the node is really completely out of sync
> with all the other nodes.
> So in the light of this, do you have more info on your own case ?
> (do you lots of small row, few of large ones ? Did you expected
> the node to be widely out of sync with the other nodes ? Etc..)
> --
> Sylvain
>> Is that true and if so is it the intended behavior? If so one would rather need 5-6
times of disc space given that compactions that need to run after the sstable rebuild also
need temp disc space.
>> Cheers,
>> Daniel

View raw message