There is an issue where repair can stream too much data, and this can lead to excessive disk use.
My non scientific approach to the never run repair before problem is to repair a single CF at a time, starting with the small ones that are less likely to have differences as they will stream the smallest amount of data.
If you really want to conserve disk IO during the repair consider disabling the minor compaction by setting the min and max thresholds to 0 via node tool.
hope that helps.
Freelance Cassandra Developer
just found this:
but seems only available to 0.8 and people submitted a patch for 0.6, I am using 0.7.4, do I need to dig into the code and make my own patch?
does add compaction throttle solve the io problem? thanks!
On Wed, Jul 20, 2011 at 4:44 PM, Yan Chunlu <email@example.com>
at the beginning of using cassandra, I have no idea that I should run "node repair" frequently, so basically, I have 3 nodes with RF=3 and have not run node repair for months, the data size is 20G.
the problem is when I start running node repair now, it eat up all disk io and the server load became 20+ and increasing, the worst thing is, the entire cluster has slowed down and can not handle request. so I have to stop it immediately because it make my web service unavailable.
the server has Intel Xeon-Lynnfield 3470-Quadcore [2.93GHz] and 8G memory, with Western Digital WD RE3 WD1002FBYS SATA disk.
I really have no idea what to do now, as currently I have already found some data loss, any suggestions would be appreciated.