We running a pretty consistent load on our cluster and added a new node to a 6 node cluster Friday(QA worked great, but production not so much). One mistake that was made was starting up the new node, then disabling the firewall :( which allowed nodes to discover it BEFORE the node bootstrapped itself. We shutdown the node and booted him up and he bootstrapped himself streaming all the data in.
After that though, all the ndoes have really really high load numbers now. We are trying to figure out what is going on still.
Is there any way to get the number of reads/second and writes/second through JMX or something? The only way I can see of on doing this is manually calculating it by timing the read count and dividing by my manual stop watches start/stop times(timerange).
Also, while my load is load average: 20.31, 19.10, 19.72 , what does a normal iostat look like? My iostat await time is 13.66 ms which I think is kind of bad, but not that bad to cause a load of 20.31?
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.02 0.07 11.70 1.96 1353.67 702.88 150.58 0.19 13.66 3.61 4.93
sdb 0.00 0.02 0.11 0.46 20.72 97.54 206.70 0.00 1.33 0.67 0.04