We simulated a node 'failure' on one of our nodes by deleting the entire Cassandra installation directory & reconfiguring a fresh instance with the same token. When we issued a 'repair' it started streaming data back onto the node as expected.
However after the repair completed, we had over 2.5 times the original load. Issuing a 'cleanup' reduced this to about 1.5 times the original load. We observed an increase in the number of keys via 'cfstats' which is obviously accounting for the increased load.
Would anybody know why the repair pulled more keys in than it had initially with the same token? How can we avoid this recurring?
If we didn't have sufficient headroom on the disk to handle say 3 times the load, we could be in a difficult situation should we experience a genuine failure.
(we're using Cassandra 1.0.5, 12 nodes split across 2 data centres, total cluster load during testing was about 150GB)