"routing more traffic to it?"

So shouldn't I see more "network in" on that node in the AWS console ?

It seems that each node is recieving and sending an equal amount of data.

What value should I use for dynamic-snitch-badness-threshold to give it a try ?

Oh, you're on ec2.  Maybe the dynamic snitch is detecting that one node is performing better than the others so is routing more traffic to it?



"Is there a sustained difference or did it settle back ? "

Sustained, clearly. During the day all nodes read at about 6MB/s while this one reads at 30-40 MB/s. At night while other reads 2MB/s the "broken" nodes reads at 8-10MB/s

"Could this have been compaction or repair or upgrade tables working ? "

Was my first thought but definitely no. this occurs continuously.

"Do the read / write counts available in nodetool cfstats show anything different ? "

The cfstats shows different counts (a lot less reads/writes for the "bad" node)  but they didn't join the ring at the same time. I join you the cfstats just in case it could help somehow.

Node  38: http://pastebin.com/ViS1MR8d (bad one)


"clients always connect to that server"

I didn't join it in the screenshot from AWS console, but AWS report an (almost) equal network within the nodes (same for output and cpu). The cpu load is a lot higher in the broken node as shown by the OpsCenter, but that's due to the high iowait...)

