incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Cooper <Andrew.Coo...@nisc.coop>
Subject Stalled streams during repairs
Date Wed, 16 Apr 2014 22:17:55 GMT
We are running into a reproducible issue in one of our cassandra clusters.  We are seeing that
during an anti-entropy repair, if a particular sstable is streaming to multiple endpoints
and the two streams happen to hit the same section of the sstable, it stalls all streams indefinitely
on the source node.  The only way we can clear this is to restart cassandra on the node, or
cause the sockets to timeout by dropping the switch port, drop networking, etc.  The underlying
tcp connection shows established on both source and target nodes, so cassandra's socket timeouts
are not triggering.  It seems that some sort of deadlock is happening inside the source node's
streaming manager?

We are running cassandra 1.2.5.  I have checked through the change logs up to 1.2.16 and do
not see any indications of this being a known (and fixed) issue.
I think the perfect storm that allows this to happen is none of the target nodes have the
sstable, and streamthroughput is such that the streams are running at similar speed.

Example output from nodetool netstats is below (progress does not change, no additional data
can be streamed to these endpoints because the first file is not completed, which effectively
stalls repairs)

Mode: NORMAL
Streaming to: /172.24.58.23
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-699-Data.db sections=1445
progress=41943040/66686679 - 62%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-702-Data.db sections=1409
progress=0/675554186 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-781-Data.db sections=1448
progress=0/5578074 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-704-Data.db sections=1457
progress=0/263084543 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-705-Data.db sections=1419
progress=0/267463691 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-771-Data.db sections=1449
progress=0/69152270 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-700-Data.db sections=1394
progress=0/185688159 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-698-Data.db sections=1421
progress=0/748217766 - 0%
Streaming to: /172.24.58.33
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-699-Data.db sections=1445
progress=20971520/66686679 - 31%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-702-Data.db sections=1409
progress=0/675554186 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-781-Data.db sections=1448
progress=0/5578074 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-704-Data.db sections=1457
progress=0/263084543 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-705-Data.db sections=1419
progress=0/267463691 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-771-Data.db sections=1449
progress=0/69152270 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-700-Data.db sections=1394
progress=0/185688159 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-698-Data.db sections=1421
progress=0/748217766 - 0%
Streaming to: /172.24.58.24
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-699-Data.db sections=1445
progress=20971520/66686679 - 31%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-783-Data.db sections=1447
progress=0/2596067 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-702-Data.db sections=1409
progress=0/675554186 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-781-Data.db sections=1448
progress=0/5578074 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-704-Data.db sections=1457
progress=0/263084543 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-705-Data.db sections=1419
progress=0/267463691 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-771-Data.db sections=1449
progress=0/69152270 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-700-Data.db sections=1394
progress=0/185688159 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-784-Data.db sections=1448
progress=0/8519551 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-698-Data.db sections=1421
progress=0/748217766 - 0%
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0       39393765
Responses                       n/a         0       21929307

I would appreciate any feedback or advice on this. thanks,
-Andrew
andrew.cooper@nisc.coop<mailto:andrew.cooper@nisc.coop>

Mime
View raw message