incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Doubleday <>
Subject Alternative to repair
Date Mon, 07 Mar 2011 17:18:18 GMT
Hi all

we're still on 0.6 and are facing problems with repairs. 

I.e. a repair for one CF takes around 60h and we have to do that twice (RF=3, 5 nodes). During
that time the cluster is under pretty heavy IO load. It kinda works but during peek times
we see lots of dropped messages (including writes). So we are actually creating inconsistencies
that we are trying to fix with the repair.

Since we already have a very simple hadoopish framework in place which allows us to do token
range walks with multiple workers and restart at a given position in case of failure I created
a simple worker that would read everything with CL_ALL. With only one worker and almost no
performance impact one scan took 7h.

My understanding is that at that point due to read repair I got the same as I would have achieved
with repair runs.

Is that true or am I missing something?


View raw message