Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1C76C197F9 for ; Wed, 13 Apr 2016 08:58:56 +0000 (UTC) Received: (qmail 82897 invoked by uid 500); 13 Apr 2016 08:58:50 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 82850 invoked by uid 500); 13 Apr 2016 08:58:50 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 82840 invoked by uid 99); 13 Apr 2016 08:58:50 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Apr 2016 08:58:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 912031A0391 for ; Wed, 13 Apr 2016 08:58:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.997 X-Spam-Level: * X-Spam-Status: No, score=1.997 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=ruckuswireless.onmicrosoft.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Ew1Tf_CnGuY4 for ; Wed, 13 Apr 2016 08:58:48 +0000 (UTC) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1bon0066.outbound.protection.outlook.com [157.56.111.66]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id CF8235F1F0 for ; Wed, 13 Apr 2016 08:58:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=RUCKUSWIRELESS.onmicrosoft.com; s=selector1-ruckuswireless-com; h=From:To:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=lz/wnahDQdG9DZkIpKCUpWgctQBAv4VgXEt0cGbBskw=; b=BULtDvNOm9wdkvm+1fF02Wj6Mm7QjzTQhq0Qd+XHC/Nto6ak5TeQs4bpnXNQ3PlAhYy+Wxm8lnGwreR/4OEnwnd17zrl1a/fT6SmDIdlvjKTD11WSiu5OL/I3vq0QeCqTiwGF2Yclmf52Gx4fiT30XWAS7J6PomCZuXbuWasROU= Received: from CY1PR08MB1740.namprd08.prod.outlook.com (10.162.217.158) by CY1PR08MB1740.namprd08.prod.outlook.com (10.162.217.158) with Microsoft SMTP Server (TLS) id 15.1.453.26; Wed, 13 Apr 2016 08:58:40 +0000 Received: from CY1PR08MB1740.namprd08.prod.outlook.com ([10.162.217.158]) by CY1PR08MB1740.namprd08.prod.outlook.com ([10.162.217.158]) with mapi id 15.01.0453.029; Wed, 13 Apr 2016 08:58:40 +0000 From: Michael Fong To: "user@cassandra.apache.org" Subject: C* 1.2.x vs Gossip marking DOWN/UP Thread-Topic: C* 1.2.x vs Gossip marking DOWN/UP Thread-Index: AdGVYYs7FRVB+RbHSxitWyi+FVm8nA== Date: Wed, 13 Apr 2016 08:58:40 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: cassandra.apache.org; dkim=none (message not signed) header.d=none;cassandra.apache.org; dmarc=none action=none header.from=ruckuswireless.com; x-originating-ip: [59.124.251.135] x-ms-office365-filtering-correlation-id: 450fb46d-0bdc-4b4d-3e9d-08d36379cedc x-microsoft-exchange-diagnostics: 1;CY1PR08MB1740;5:PzOPE/tureRp+E0hHsLsxqfY5jH2F0wAAu76eyceKC3w9sb0Q2ng1yBwdbS6LXISJeKYSZX6kDABFzF1owbFR5I4CxE9V8RZIbQmHCmfv9OLe2/oS0gBwqMys/UDZfpPrn3OeiA5FGgUX4cy/HGUHg==;24:yAU9k5YV3FIp5cJPJs1MMIpp00rqWHlukgq2VG8tf/AfJEbn8im+kSjaXP2Ghzvw3xxDME2TSlYj9PRhx5Zk/tepbhjhggO+1E31XuSA7p4=;20:Jzi3dtpD7+5FsCE6MnNJeojFYUfkwzBkpV/84n2DR684LEC17rOlN1GmI9lSetRBcy6K3zJrsfbocU22WEvyBF4Jrw8lXW52qmUXUmUMl8NC4a02HVJr1+LOjeps3JySC69tICfpkuv2UPtPo8KEkbnrLDhJjK1k9Pyn++U4+cU= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY1PR08MB1740; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046);SRVR:CY1PR08MB1740;BCL:0;PCL:0;RULEID:;SRVR:CY1PR08MB1740; x-forefront-prvs: 0911D5CE78 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(51874003)(76576001)(19609705001)(92566002)(189998001)(450100001)(2906002)(10400500002)(107886002)(3280700002)(54356999)(3660700001)(110136002)(5002640100001)(5003600100002)(2501003)(16236675004)(2900100001)(77096005)(15975445007)(33656002)(86362001)(19625215002)(586003)(3846002)(87936001)(1220700001)(5008740100001)(790700001)(6116002)(50986999)(5004730100002)(99286002)(81166005)(66066001)(229853001)(2351001)(74316001)(102836003)(1096002)(19300405004)(19580395003)(11100500001)(9686002)(122556002);DIR:OUT;SFP:1101;SCL:1;SRVR:CY1PR08MB1740;H:CY1PR08MB1740.namprd08.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: multipart/alternative; boundary="_000_CY1PR08MB174078B00DBD7C4F0AC2FB33F5960CY1PR08MB1740namp_" MIME-Version: 1.0 X-OriginatorOrg: ruckuswireless.com X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Apr 2016 08:58:40.1636 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 936dde22-c4d1-470f-b240-618bdde2e61f X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR08MB1740 --_000_CY1PR08MB174078B00DBD7C4F0AC2FB33F5960CY1PR08MB1740namp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, all We have been a Cassandra 4-node cluster (C* 1.2.x) where a node marked all = the other 3 nodes DOWN, and came back UP a few seconds later. There was a c= ompaction that kicked in a minute before, roughly 10~MB in size, followed b= y marking all the other nodes DOWN later. In the other words, in the system= .log we see 00:00:00 Compacting .... 00:00:03 Compacted 8 sstables ... 10~ megabytes 00:01:06 InetAddress /x.x.x.4 is now DOWN 00:01:06 InetAddress /x.x.x.3 is now DOWN 00:01:06 InetAddress /x.x.x.1 is now DOWN There was no significant GC activities in gc.log. We have heard that busy c= ompaction activities would cause this behavior, but we cannot reason why th= is could happen logically. How come a compaction operation would stop the G= ossip thread to perform heartbeat check? Has anyone experienced this kind o= f behavior before? Thanks in advanced! Sincerely, Michael Fong --_000_CY1PR08MB174078B00DBD7C4F0AC2FB33F5960CY1PR08MB1740namp_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi, all

 

 

We have been a Cassandra 4-node cluster (C* 1.2.x) w= here a node marked all the other 3 nodes DOWN, and came back UP a few secon= ds later. There was a compaction that kicked in a minute before, roughly 10= ~MB in size, followed by marking all the other nodes DOWN later. In the other words, in the system.log we see

00:00:00 Compacting ….

00:00:03 Compacted 8 sstables … 10~ megabytes<= o:p>

00:01:06 InetAddress /x.x.x.4 is now DOWN=

00:01:06 InetAddress /x.x.x.3 is now DOWN=

00:01:06 InetAddress /x.x.x.1 is now DOWN=

 

There was no significant GC activities in gc.log. We= have heard that busy compaction activities would cause this behavior, but = we cannot reason why this could happen logically. How come a compaction ope= ration would stop the Gossip thread to perform heartbeat check? Has anyone experienced this kind of behavior b= efore?

 

Thanks in advanced!

 

Sincerely,

 

Michael Fong

--_000_CY1PR08MB174078B00DBD7C4F0AC2FB33F5960CY1PR08MB1740namp_--