Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 68A5410951 for ; Wed, 16 Apr 2014 22:18:40 +0000 (UTC) Received: (qmail 46263 invoked by uid 500); 16 Apr 2014 22:18:28 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 46221 invoked by uid 500); 16 Apr 2014 22:18:27 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 46143 invoked by uid 99); 16 Apr 2014 22:18:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2014 22:18:25 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of prvs=176d79010=Andrew.Cooper@nisc.coop designates 12.10.126.224 as permitted sender) Received: from [12.10.126.224] (HELO mail.nisc.coop) (12.10.126.224) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2014 22:18:20 +0000 X-IronPort-AV: E=Sophos;i="4.97,875,1389765600"; d="scan'208,217";a="1464423" Received: from croc.nisc.coop (HELO ndmail2.nisc.lan) ([207.243.68.200]) by mail.nisc.coop with ESMTP; 16 Apr 2014 17:17:56 -0500 From: Andrew Cooper To: "user@cassandra.apache.org" Subject: Stalled streams during repairs Thread-Topic: Stalled streams during repairs Thread-Index: AQHPWcG3JIbhHBNO8kiwYnst4FCTzQ== Date: Wed, 16 Apr 2014 22:17:55 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: multipart/alternative; boundary="_000_AF538F617B6E4ACEA206A13C28012496nisccoop_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_AF538F617B6E4ACEA206A13C28012496nisccoop_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable We are running into a reproducible issue in one of our cassandra clusters. = We are seeing that during an anti-entropy repair, if a particular sstable = is streaming to multiple endpoints and the two streams happen to hit the sa= me section of the sstable, it stalls all streams indefinitely on the source= node. The only way we can clear this is to restart cassandra on the node,= or cause the sockets to timeout by dropping the switch port, drop networki= ng, etc. The underlying tcp connection shows established on both source an= d target nodes, so cassandra's socket timeouts are not triggering. It seem= s that some sort of deadlock is happening inside the source node's streamin= g manager? We are running cassandra 1.2.5. I have checked through the change logs up = to 1.2.16 and do not see any indications of this being a known (and fixed) = issue. I think the perfect storm that allows this to happen is none of the target = nodes have the sstable, and streamthroughput is such that the streams are r= unning at similar speed. Example output from nodetool netstats is below (progress does not change, n= o additional data can be streamed to these endpoints because the first file= is not completed, which effectively stalls repairs) Mode: NORMAL Streaming to: /172.24.58.23 /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-69= 9-Data.db sections=3D1445 progress=3D41943040/66686679 - 62% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-70= 2-Data.db sections=3D1409 progress=3D0/675554186 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-78= 1-Data.db sections=3D1448 progress=3D0/5578074 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-70= 4-Data.db sections=3D1457 progress=3D0/263084543 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-70= 5-Data.db sections=3D1419 progress=3D0/267463691 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-77= 1-Data.db sections=3D1449 progress=3D0/69152270 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-70= 0-Data.db sections=3D1394 progress=3D0/185688159 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-69= 8-Data.db sections=3D1421 progress=3D0/748217766 - 0% Streaming to: /172.24.58.33 /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-69= 9-Data.db sections=3D1445 progress=3D20971520/66686679 - 31% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-70= 2-Data.db sections=3D1409 progress=3D0/675554186 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-78= 1-Data.db sections=3D1448 progress=3D0/5578074 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-70= 4-Data.db sections=3D1457 progress=3D0/263084543 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-70= 5-Data.db sections=3D1419 progress=3D0/267463691 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-77= 1-Data.db sections=3D1449 progress=3D0/69152270 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-70= 0-Data.db sections=3D1394 progress=3D0/185688159 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-69= 8-Data.db sections=3D1421 progress=3D0/748217766 - 0% Streaming to: /172.24.58.24 /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-69= 9-Data.db sections=3D1445 progress=3D20971520/66686679 - 31% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-78= 3-Data.db sections=3D1447 progress=3D0/2596067 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-70= 2-Data.db sections=3D1409 progress=3D0/675554186 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-78= 1-Data.db sections=3D1448 progress=3D0/5578074 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-70= 4-Data.db sections=3D1457 progress=3D0/263084543 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-70= 5-Data.db sections=3D1419 progress=3D0/267463691 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-77= 1-Data.db sections=3D1449 progress=3D0/69152270 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-70= 0-Data.db sections=3D1394 progress=3D0/185688159 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-78= 4-Data.db sections=3D1448 progress=3D0/8519551 - 0% /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_intervals-ic-69= 8-Data.db sections=3D1421 progress=3D0/748217766 - 0% Not receiving any streams. Pool Name Active Pending Completed Commands n/a 0 39393765 Responses n/a 0 21929307 I would appreciate any feedback or advice on this. thanks, -Andrew andrew.cooper@nisc.coop --_000_AF538F617B6E4ACEA206A13C28012496nisccoop_ Content-Type: text/html; charset="us-ascii" Content-ID: <66FC83224779FE4986CC3E9FE78C9C3E@nisc.coop> Content-Transfer-Encoding: quoted-printable We are running into a reproducible issue in one of our cassandra clusters. =  We are seeing that during an anti-entropy repair, if a particular sst= able is streaming to multiple endpoints and the two streams happen to hit t= he same section of the sstable, it stalls all streams indefinitely on the source node.  The only way we can cle= ar this is to restart cassandra on the node, or cause the sockets to timeou= t by dropping the switch port, drop networking, etc.  The underlying t= cp connection shows established on both source and target nodes, so cassandra's socket timeouts are not triggering.  = ;It seems that some sort of deadlock is happening inside the source node's = streaming manager?

We are running cassandra 1.2.5.  I have checked through the chang= e logs up to 1.2.16 and do not see any indications of this being a known (a= nd fixed) issue.
I think the perfect storm that allows this to happen is none of the ta= rget nodes have the sstable, and streamthroughput is such that the streams = are running at similar speed.

Example output from nodetool netstats is below (progress does not chan= ge, no additional data can be streamed to these endpoints because the first= file is not completed, which effectively stalls repairs)

Mode: NORMAL
Streaming to: /172.24.58.23
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-699-Data.db sections=3D1445 progress=3D41943040/66686679 - 62%=
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-702-Data.db sections=3D1409 progress=3D0/675554186 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-781-Data.db sections=3D1448 progress=3D0/5578074 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-704-Data.db sections=3D1457 progress=3D0/263084543 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-705-Data.db sections=3D1419 progress=3D0/267463691 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-771-Data.db sections=3D1449 progress=3D0/69152270 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-700-Data.db sections=3D1394 progress=3D0/185688159 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-698-Data.db sections=3D1421 progress=3D0/748217766 - 0%
Streaming to: /172.24.58.33
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mv= ec_intervals-ic-699-Data.db sections=3D1445 progress=3D20971520/66686679 - = 31%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-702-Data.db sections=3D1409 progress=3D0/675554186 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-781-Data.db sections=3D1448 progress=3D0/5578074 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-704-Data.db sections=3D1457 progress=3D0/263084543 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-705-Data.db sections=3D1419 progress=3D0/267463691 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-771-Data.db sections=3D1449 progress=3D0/69152270 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-700-Data.db sections=3D1394 progress=3D0/185688159 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-698-Data.db sections=3D1421 progress=3D0/748217766 - 0%
Streaming to: /172.24.58.24
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mv= ec_intervals-ic-699-Data.db sections=3D1445 progress=3D20971520/66686679 - = 31%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-783-Data.db sections=3D1447 progress=3D0/2596067 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-702-Data.db sections=3D1409 progress=3D0/675554186 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-781-Data.db sections=3D1448 progress=3D0/5578074 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-704-Data.db sections=3D1457 progress=3D0/263084543 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-705-Data.db sections=3D1419 progress=3D0/267463691 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-771-Data.db sections=3D1449 progress=3D0/69152270 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-700-Data.db sections=3D1394 progress=3D0/185688159 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-784-Data.db sections=3D1448 progress=3D0/8519551 - 0%
   /usr/lib/cassandra/data/data/mdm/mvec_intervals/mdm-mvec_= intervals-ic-698-Data.db sections=3D1421 progress=3D0/748217766 - 0%
Not receiving any streams.
Pool Name                 &nbs= p;  Active   Pending      Completed
Commands                  = ;      n/a         0      = ; 39393765
Responses                 &nbs= p;     n/a         0       219= 29307

I would appreciate any feedback or advice on this. thanks,
-Andrew
--_000_AF538F617B6E4ACEA206A13C28012496nisccoop_--