Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E080C9DE8 for ; Tue, 3 Jul 2012 03:41:57 +0000 (UTC) Received: (qmail 47744 invoked by uid 500); 3 Jul 2012 03:41:55 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 47293 invoked by uid 500); 3 Jul 2012 03:41:55 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 47259 invoked by uid 99); 3 Jul 2012 03:41:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jul 2012 03:41:54 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of feedlydev@gmail.com designates 209.85.160.172 as permitted sender) Received: from [209.85.160.172] (HELO mail-gh0-f172.google.com) (209.85.160.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jul 2012 03:41:47 +0000 Received: by ghbg16 with SMTP id g16so5356053ghb.31 for ; Mon, 02 Jul 2012 20:41:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=isTd7MVEG68Kz1qUMhHQe6NNwJLbAAGM/8XyTIzGeFQ=; b=OLZg+Ddhezkc9qcIfk2/T5sUDItdiWpoQw3+LIFZSFkSVgqmr7taJ8CCtzqT0KMY7i /hj9OFEtFoo0yQyvZ38VutbtfZE0M0mJsE4NaOvPr5y6xAWjHdj2PeZ1s7zNODwPVkqy prvJyMaq594IbyqiiTmqxb+BFiwELXvn5Cdn9RG0T/Z2QC5PpcZJq0wXp37ChT6nGYRR gR80HgvTvmz1rKAGMNWlavvM5eC4ghcYyTa04BEvytD3kCQXZ83p65vBilPJyNX90Ybb kC28v2bSEmnHPtypOjx7lOVQu6MHv+r626M34HLFUcmqktSyNZDutdQ/sq1l2PRAA8ki VgKQ== MIME-Version: 1.0 Received: by 10.42.210.193 with SMTP id gl1mr7506297icb.57.1341286886405; Mon, 02 Jul 2012 20:41:26 -0700 (PDT) Received: by 10.64.12.47 with HTTP; Mon, 2 Jul 2012 20:41:25 -0700 (PDT) In-Reply-To: References: <20120702121753.2A63.C3984673@terra.com.br> Date: Mon, 2 Jul 2012 23:41:25 -0400 Message-ID: Subject: Re: frequent node up/downs From: feedly team To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf301cc4ee67f5f104c3e4b069 --20cf301cc4ee67f5f104c3e4b069 Content-Type: text/plain; charset=ISO-8859-1 Couple more details. I confirmed that swap space is not being used (free -m shows 0 swap) and cassandra.log has a message like "JNA mlockall successful". top shows the process having 9g in resident memory but 21.6g in virtual...What accounts for the much larger virtual number? some kind of off-heap memory? I'm a little puzzled as to why I would get such long pauses without swapping. I uncommented all the gc logging options in cassandra-env.sh to try to see what is going on when the node freezes. Thanks Kireet On Mon, Jul 2, 2012 at 9:51 PM, feedly team wrote: > Yeah I noticed the leap second problem and ran the suggested fix, but I > have been facing these problems before Saturday and still see the > occasional failures after running the fix. > > Thanks. > > > On Mon, Jul 2, 2012 at 11:17 AM, Marcus Both wrote: > >> Yeah! Look that. >> >> http://arstechnica.com/business/2012/07/one-day-later-the-leap-second-v-the-internet-scorecard/ >> I had the same problem. The solution was rebooting. >> >> On Mon, 2 Jul 2012 11:08:57 -0400 >> feedly team wrote: >> >> > Hello, >> > I recently set up a 2 node cassandra cluster on dedicated hardware. >> In >> > the logs there have been a lot of "InetAddress xxx is now dead' or UP >> > messages. Comparing the log messages between the 2 nodes, they seem to >> > coincide with extremely long ParNew collections. I have seem some of up >> to >> > 50 seconds. The installation is pretty vanilla, I didn't change any >> > settings and the machines don't seem particularly busy - cassandra is >> the >> > only thing running on the machine with an 8GB heap. The machine has >> 64GB of >> > RAM and CPU/IO usage looks pretty light. I do see a lot of 'Heap is xxx >> > full. You may need to reduce memtable and/or cache sizes' messages. >> Would >> > this help with the long ParNew collections? That message seems to be >> > triggered on a full collection. >> >> -- >> Marcus Both >> >> > --20cf301cc4ee67f5f104c3e4b069 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Couple more details. I confirmed that swap space is not being used (free -m= shows 0 swap) and cassandra.log has a message like "JNA mlockall succ= essful". top shows the process having 9g in resident memory but 21.6g = in virtual...What accounts for the much larger virtual number? some kind of= off-heap memory?=A0

I'm a little puzzled as to why I would get such long pau= ses without swapping. I uncommented all the gc logging options in cassandra= -env.sh to try to see what is going on when the node freezes.

Thanks
Kireet

O= n Mon, Jul 2, 2012 at 9:51 PM, feedly team <feedlydev@gmail.com><= /span> wrote:
Yeah I noticed the leap second problem and r= an the suggested fix, but I have been facing these problems before Saturday= and still see the occasional failures after running the fix.=A0

Thanks.


On Mon, Jul 2, 2012 at 11:17 AM, Marcus Both <mboth@terra.com.br>= wrote:
Yeah! Look that.
http://arstechnica.com/b= usiness/2012/07/one-day-later-the-leap-second-v-the-internet-scorecard/=
I had the same problem. The solution was rebooting.

On Mon, 2 Jul 2012 11:08:57 -0400
feedly team <fe= edlydev@gmail.com> wrote:

> Hello,
> =A0 =A0I recently set up a 2 node cassandra cluster on dedicated hardw= are. In
> the logs there have been a lot of "InetAddress xxx is now dead= 9; or UP
> messages. Comparing the log messages between the 2 nodes, they seem to=
> coincide with extremely long ParNew collections. I have seem some of u= p to
> 50 seconds. The installation is pretty vanilla, I didn't change an= y
> settings and the machines don't seem particularly busy - cassandra= is the
> only thing running on the machine with an 8GB heap. The machine has 64= GB of
> RAM and CPU/IO usage looks pretty light. I do see a lot of 'Heap i= s xxx
> full. You may need to reduce memtable and/or cache sizes' messages= . Would
> this help with the long ParNew collections? That message seems to be > triggered on a full collection.

--
Marcus Both



--20cf301cc4ee67f5f104c3e4b069--