We have a 4 node 0.7.6 cluster. RF=2 , 3 TB data per node.
A read repair was kicked off on node 4 last week and is still in progress.
Later I kicked of read repair on node 2 a few days back.
We were writing(read/write/updates/NO deletes) data while the repair was in progress but no data has been written for the past 3-4 days.
I was hoping the repair should get done in that time-frame before proceeding with further writes/deletes.
Would it be safe to stop it and kick it off per column family or do a full scan of all keys as suggested in an earlier discussion? Any other suggestion on hastening this repair.
On both nodes the repair Thread is waiting at this stage for a long time(~60+ hours)
at java.lang.Object.wait(Native Method)
- waiting on <580857f3> (a org.apache.cassandra.utils.SimpleCondition)
Locked ownable synchronizers:
A CPU sampling for few minutes shows these methods as hot spots(mostly the top two)
netstats does not show anything streaming to/from any of the nodes.