incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wright <kwri...@nanigans.com>
Subject Nodes get stuck
Date Wed, 21 Aug 2013 00:32:50 GMT
Hi all,

    We are using C* 1.2.4 with Vnodes and SSD.  We have seen behavior recently where 3 of
our nodes get locked up in high load in what appears to be a GC spiral while the rest of the
cluster (7 total nodes) appears fine.  When I run a tpstats, I see the following (assuming
tpstats returns at all) and top shows cassandra pegged at 2000%.  Obviously we have a large
number of blocked reads.  In the past I could explain this due to unexpectedly wide rows however
we have handled that.  When the cluster starts to meltdown like this its hard to get visibility
into what's going on and what triggered the issue as everything starts to pile on.  Opscenter
becomes unusable and because the effected nodes are in GC pressure, getting any data via nodetool
or JMX is also difficult.  What do people do to handle these situations?  We are going to
start graphing reads/writes/sec/CF to Ganglia in the hopes that it helps.

Thanks

Pool Name                    Active   Pending      Completed   Blocked  All time blocked
ReadStage                       256       381     1245117434         0                 0
RequestResponseStage              0         0     1161495947         0                 0
MutationStage                     8         8      481721887         0                 0
ReadRepairStage                   0         0       85770600         0                 0
ReplicateOnWriteStage             0         0       21896804         0                 0
GossipStage                       0         0        1546196         0                 0
AntiEntropyStage                  0         0           5009         0                 0
MigrationStage                    0         0           1082         0                 0
MemtablePostFlusher               0         0          10178         0                 0
FlushWriter                       0         0           6081         0              2075
MiscStage                         0         0             57         0                 0
commitlog_archiver                0         0              0         0                 0
AntiEntropySessions               0         0              0         0                 0
InternalResponseStage             0         0              6         0                 0
HintedHandoff                     1         1            246         0                 0

Message type           Dropped
RANGE_SLICE                482
READ_REPAIR                  0
BINARY                       0
READ                    515762
MUTATION                    39
_TRACE                       0
REQUEST_RESPONSE            29


Mime
View raw message