On Tue, Sep 13, 2011 at 3:57 PM, Peter Schuller <peter.schuller@infidyne.com> wrote:
> I think it is a serious problem since I can not "repair".....  I am
> using cassandra on production servers. is there some way to fix it
> without upgrade?  I heard of that 0.8.x is still not quite ready in
> production environment.

It is a serious issue if you really need to repair one CF at the time.
Why is it serious to do repair one CF at a time, if I cannot do that it at a CF level, then does it mean that I cannot use more than 50% disk space? Is this specific to this problem or is that a general statement? I ask because I am planning on doing this so I can limit the max disk overhead to be a CF (+ some factor) worth. I am going to be testing this in the next couple of weeks or so.
However, looking at your original post it seems this is not
necessarily your issue. Do you need to, or was your concern rather the
overall time repair took?

There are other things that are improved in 0.8 that affect 0.7. In
particular, (1) in 0.7 compaction, including validating compactions
that are part of repair, is non-concurrent so if your repair starts
while there is a long-running compaction going it will have to wait,
and (2) semi-related is that the merkle tree calculation that is part
of repair/anti-entropy may happen "out of synch" if one of the nodes
participating happen to be busy with compaction. This in turns causes
additional data to be sent as part of repair.

That might be why your immediately following repair took a long time,
but it's difficult to tell.

If you're having issues with repair and large data sets, I would
generally say that upgrading to 0.8 is recommended. However, if you're
on 0.7.4, beware of

/ Peter Schuller (@scode on twitter)