From Daniel Doubleday <>
Subject Stuck with adding nodes
Date Thu, 09 Dec 2010 19:01:07 GMT
Hi good people.

I underestimated load during peak times and now I'm stuck with our production cluster. 
Right now its 3 nodes, rf 3 so everything is everywhere. We have ~300GB data load. ~10MB/sec
incoming traffic and ~50 (peak) reads/sec to the cluster

The problem derives from our quorum read / writes: At peak hours one of the machines (thats
random) will fall behind because its a little slower than the others and than shortly after
that it will drop most read requests. So right now the only way to survive is to take one
machine down making every read / write a ALL operation. It's necessary to take one machine
down because otherwise users will wait for timeouts from that overwhelmed machine when the
client lib chooses it. Since we are a real time oriented thing thats a killer.

So now we tried to add 2 more nodes. Problem is that anticompaction takes to long. Meaning
it is not done when peak hour arrives and the machine that would stream the data to the new
node must be taken down. We tried to block the ports 7000 and 9160 to that machine because
we hoped that would stop traffic and let the machine end anticompaction. But that did not
work because we could not cut the already existing connections to the other nodes.

Currently I am copying all data files (thats all existing data) from one node to the new nodes
in hope that I could than manually assign them their new tokenrange (nodetool move) and do

Obviously I will try this tomorrow (it's been a long day) on a test system but any advice
would be highly appreciated.

Sighs and thanks.
