The tooth wave in memory utilization could be memtable dumps. I/O wait in TCP happens when you are overwhelming the server with requests. Could you run sar and find out how many bytes/sec you are receiving/transmitting?

Cheers
Avinash

On Thu, Apr 8, 2010 at 7:45 AM, Mark Jones <MJones@imagehawk.com> wrote:
I don't see any way to increase the # of active Deserializers in storage-conf.xml

Tpstats more than 8 hours after insert/read stop

Pool Name                    Active   Pending      Completed
FILEUTILS-DELETE-POOL             0         0            227
STREAM-STAGE                      0         0              1
RESPONSE-STAGE                    0         0       76724280
ROW-READ-STAGE                    8      4091        1138277
LB-OPERATIONS                     0         0              0
MESSAGE-DESERIALIZER-POOL         1   1849826       78135012
GMFD                              0         0         136886
LB-TARGET                         0         0              0
CONSISTENCY-MANAGER               0         0           1803
ROW-MUTATION-STAGE                0         0       68669717
MESSAGE-STREAMING-POOL            0         0              0
LOAD-BALANCER-STAGE               0         0              0
FLUSH-SORTER-POOL                 0         0              0
MEMTABLE-POST-FLUSHER             0         0            438
FLUSH-WRITER-POOL                 0         0            438
AE-SERVICE-STAGE                  0         0              3
HINTED-HANDOFF-POOL               0         0              3

More than 30 minutes later (with no reads or writes to the cluster)

Pool Name                    Active   Pending      Completed
FILEUTILS-DELETE-POOL             0         0            227
STREAM-STAGE                      0         0              1
RESPONSE-STAGE                    0         0       76724280
ROW-READ-STAGE                    8      4098        1314304
LB-OPERATIONS                     0         0              0
MESSAGE-DESERIALIZER-POOL         1   1663578       78336771
GMFD                              0         0         142651
LB-TARGET                         0         0              0
CONSISTENCY-MANAGER               0         0           1803
ROW-MUTATION-STAGE                0         0       68669717
MESSAGE-STREAMING-POOL            0         0              0
LOAD-BALANCER-STAGE               0         0              0
FLUSH-SORTER-POOL                 0         0              0
MEMTABLE-POST-FLUSHER             0         0            438
FLUSH-WRITER-POOL                 0         0            438
AE-SERVICE-STAGE                  0         0              3
HINTED-HANDOFF-POOL               0         0              3

The other 2 nodes in the cluster have Pending Counts of 0, but this node seems hung
indefinitely processing requests that should have long ago timed out for the client.

TOP is showing a huge amount of I/O Wait, but I'm not sure how to track where the wait is happening below here.  I now have jconsole up and running on this machine, and the memory usage appears to be a saw tooth wave, going from 1GB up to 4GB over 3 hours, then plunging back to 1GB and resuming its climb.

top - 08:33:40 up 1 day, 19:25,  4 users,  load average: 7.75, 7.96, 8.16
Tasks: 177 total,   2 running, 175 sleeping,   0 stopped,   0 zombie
Cpu(s): 16.6%us,  7.2%sy,  0.0%ni, 34.5%id, 41.1%wa,  0.0%hi,  0.6%si,  0.0%st
Mem:   8123068k total,  8062240k used,    60828k free,     2624k buffers
Swap: 12699340k total,  1951504k used, 10747836k free,  3757300k cached