Oh, you're on ec2.  Maybe the dynamic snitch is detecting that one node is performing better than the others so is routing more traffic to it?



On Wed, Dec 19, 2012 at 2:30 PM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:
"Is there a sustained difference or did it settle back ? "

Sustained, clearly. During the day all nodes read at about 6MB/s while this one reads at 30-40 MB/s. At night while other reads 2MB/s the "broken" nodes reads at 8-10MB/s

"Could this have been compaction or repair or upgrade tables working ? "

Was my first thought but definitely no. this occurs continuously.

"Do the read / write counts available in nodetool cfstats show anything different ? "

The cfstats shows different counts (a lot less reads/writes for the "bad" node)  but they didn't join the ring at the same time. I join you the cfstats just in case it could help somehow.

Node  38: http://pastebin.com/ViS1MR8d (bad one)


"clients always connect to that server"

I didn't join it in the screenshot from AWS console, but AWS report an (almost) equal network within the nodes (same for output and cpu). The cpu load is a lot higher in the broken node as shown by the OpsCenter, but that's due to the high iowait...)

Bryan Talbot
Architect / Platform team lead, Aeria Games and Entertainment
Silicon Valley | Berlin | Tokyo | Sao Paulo