From user-return-27348-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon Jul 2 15:13:43 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5438FD0F5 for ; Mon, 2 Jul 2012 15:13:43 +0000 (UTC) Received: (qmail 12410 invoked by uid 500); 2 Jul 2012 15:13:40 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 12382 invoked by uid 500); 2 Jul 2012 15:13:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 12368 invoked by uid 99); 2 Jul 2012 15:13:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jul 2012 15:13:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of pieter.callewaert@be-mobile.be designates 213.199.154.207 as permitted sender) Received: from [213.199.154.207] (HELO am1outboundpool.messaging.microsoft.com) (213.199.154.207) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jul 2012 15:13:30 +0000 Received: from mail73-am1-R.bigfish.com (10.3.201.225) by AM1EHSOBE003.bigfish.com (10.3.204.23) with Microsoft SMTP Server id 14.1.225.23; Mon, 2 Jul 2012 15:11:12 +0000 Received: from mail73-am1 (localhost [127.0.0.1]) by mail73-am1-R.bigfish.com (Postfix) with ESMTP id 17C262E023B for ; Mon, 2 Jul 2012 15:11:12 +0000 (UTC) X-Forefront-Antispam-Report: CIP:157.56.248.213;KIP:(null);UIP:(null);IPV:NLI;H:AMXPRD0610HT001.eurprd06.prod.outlook.com;RD:none;EFVD:NLI X-SpamScore: 2 X-BigFish: PS2(zzc85fh1be0I853kzz1202hzz8275bh8275dhz2fh2a8h668h839hd25hf0ah) Received-SPF: pass (mail73-am1: domain of be-mobile.be designates 157.56.248.213 as permitted sender) client-ip=157.56.248.213; envelope-from=pieter.callewaert@be-mobile.be; helo=AMXPRD0610HT001.eurprd06.prod.outlook.com ;.outlook.com ; Received: from mail73-am1 (localhost.localdomain [127.0.0.1]) by mail73-am1 (MessageSwitch) id 1341241869858558_17215; Mon, 2 Jul 2012 15:11:09 +0000 (UTC) Received: from AM1EHSMHS001.bigfish.com (unknown [10.3.201.233]) by mail73-am1.bigfish.com (Postfix) with ESMTP id C66BF460049 for ; Mon, 2 Jul 2012 15:11:09 +0000 (UTC) Received: from AMXPRD0610HT001.eurprd06.prod.outlook.com (157.56.248.213) by AM1EHSMHS001.bigfish.com (10.3.207.101) with Microsoft SMTP Server (TLS) id 14.1.225.23; Mon, 2 Jul 2012 15:11:08 +0000 Received: from AMXPRD0610MB353.eurprd06.prod.outlook.com ([169.254.4.22]) by AMXPRD0610HT001.eurprd06.prod.outlook.com ([10.255.58.36]) with mapi id 14.16.0164.004; Mon, 2 Jul 2012 15:13:05 +0000 From: Pieter Callewaert To: "user@cassandra.apache.org" Subject: RE: frequent node up/downs Thread-Topic: frequent node up/downs Thread-Index: AQHNWGSztb21tZYRIUCA/MfkAEfjVZcWGSlw Date: Mon, 2 Jul 2012 15:13:03 +0000 Message-ID: <0B2BF1E8E35731438C02772C683FB67B1674CA1D@AMXPRD0610MB353.eurprd06.prod.outlook.com> References: In-Reply-To: Accept-Language: nl-BE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [81.82.202.11] Content-Type: multipart/alternative; boundary="_000_0B2BF1E8E35731438C02772C683FB67B1674CA1DAMXPRD0610MB353_" MIME-Version: 1.0 X-OriginatorOrg: be-mobile.be --_000_0B2BF1E8E35731438C02772C683FB67B1674CA1DAMXPRD0610MB353_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, Had the same problem this morning, seems related to the leap second bug. Rebooting the nodes fixed it for me, but there seems to be a fix also witho= ut rebooting the server. Kind regards, Pieter From: feedly team [mailto:feedlydev@gmail.com] Sent: maandag 2 juli 2012 17:09 To: user@cassandra.apache.org Subject: frequent node up/downs Hello, I recently set up a 2 node cassandra cluster on dedicated hardware. In t= he logs there have been a lot of "InetAddress xxx is now dead' or UP messag= es. Comparing the log messages between the 2 nodes, they seem to coincide w= ith extremely long ParNew collections. I have seem some of up to 50 seconds= . The installation is pretty vanilla, I didn't change any settings and the = machines don't seem particularly busy - cassandra is the only thing running= on the machine with an 8GB heap. The machine has 64GB of RAM and CPU/IO us= age looks pretty light. I do see a lot of 'Heap is xxx full. You may need t= o reduce memtable and/or cache sizes' messages. Would this help with the lo= ng ParNew collections? That message seems to be triggered on a full collect= ion. --_000_0B2BF1E8E35731438C02772C683FB67B1674CA1DAMXPRD0610MB353_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi,

 = ;

Had the sa= me problem this morning, seems related to the leap second bug.

Rebooting = the nodes fixed it for me, but there seems to be a fix also without rebooti= ng the server.

 = ;

Kind regar= ds,

Pieter

 = ;

From: feedly team [mailto:feedlydev@gmail.com]
Sent: maandag 2 juli 2012 17:09
To: user@cassandra.apache.org
Subject: frequent node up/downs

 

Hello,

   I recently set up a 2 node cassandra cl= uster on dedicated hardware. In the logs there have been a lot of "Ine= tAddress xxx is now dead' or UP messages. Comparing the log messages betwee= n the 2 nodes, they seem to coincide with extremely long ParNew collections. I have seem some of up to 50 seconds. The install= ation is pretty vanilla, I didn't change any settings and the machines don'= t seem particularly busy - cassandra is the only thing running on the machi= ne with an 8GB heap. The machine has 64GB of RAM and CPU/IO usage looks pretty light. I do see a lot of 'He= ap is xxx full. You may need to reduce memtable and/or cache sizes' message= s. Would this help with the long ParNew collections? That message seems to = be triggered on a full collection. 

--_000_0B2BF1E8E35731438C02772C683FB67B1674CA1DAMXPRD0610MB353_--