Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E774765ED for ; Wed, 1 Jun 2011 19:38:22 +0000 (UTC) Received: (qmail 71903 invoked by uid 500); 1 Jun 2011 19:38:20 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 71865 invoked by uid 500); 1 Jun 2011 19:38:20 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 71857 invoked by uid 99); 1 Jun 2011 19:38:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jun 2011 19:38:20 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sdolgy@gmail.com designates 209.85.220.172 as permitted sender) Received: from [209.85.220.172] (HELO mail-vx0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jun 2011 19:38:14 +0000 Received: by vxg33 with SMTP id 33so146431vxg.31 for ; Wed, 01 Jun 2011 12:37:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=zL+4rIGi8J3zumd6yeM1KcOZ2vTG86pyl0Xf/2d1+tA=; b=BAv13NXvxJcn5osCWgP/YMyxsyO+S9qeHefvPrr4g+peUdAsDAa7zssre8DQbx/AP1 em6Zb/160Ce3gj5np7eHRLBX80dINXF867ZkC1+XRzyxV850ly2EIev5uWUIJXzE+JUQ /DdDfkS6yqDu1eGzSr/jtUIYLydGIuMH2tttI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=sc0M6HT6WaBk446Ianp1ISecNxMrOru9D6eOcUMIa5FPM6x51sVHLVFmUH/VOcwHwW 7URgSd1mLPKMwDsZMP0NWX9BT27snKnlbwHivN1NrE2tGkkUk+TSUI6mVFAK8WiYonQk IuWSjo01Qs2S1hzPqjdRPEutlo4zrai/c5JFE= Received: by 10.52.179.193 with SMTP id di1mr3514165vdc.147.1306957073117; Wed, 01 Jun 2011 12:37:53 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.157.169 with HTTP; Wed, 1 Jun 2011 12:37:33 -0700 (PDT) In-Reply-To: References: From: Sasha Dolgy Date: Wed, 1 Jun 2011 21:37:33 +0200 Message-ID: Subject: Re: cascading failures due to memory To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org and is there anything specific that could be causing the issue between Java SE 1.6.0_24 and 1.6.0_25 ? All nodes are _24 up to 64% memory usage today -sd On Wed, Jun 1, 2011 at 9:30 PM, Sasha Dolgy wrote: > is there a specific string I should be looking for in the logs that > isn't super obvious to me at the moment... > > On Tue, May 31, 2011 at 8:21 PM, Jonathan Ellis wrote= : >> The place to start is with the statistics Cassandra logs after each GC. >> >> On Tue, May 31, 2011 at 5:01 AM, Sasha Dolgy wrote: >>> hi everyone, >>> >>> the current nodes i have deployed (4) have all been working fine, with >>> not a lot of data ... more reads than writes at the moment. =A0as i had >>> monitoring disabled, when one node's OS killed the cassandra process >>> due to out of memory problems ... that was fine. =A024 hours later, >>> another node, 24 hours later, another node ...until finally, all 4 >>> nodes no longer had cassandra running. >>> >>> When all nodes are started fresh, CPU utilization is at about 21% on >>> each box. =A0after 24 hours, this goes up to 32% and then 51% 24 hours >>> later. >>> >>> originally I had thought that this may be a result of 'nodetool >>> repair' not being run consistently ... after adding a cronjob to run >>> every 24 hours (staggered between nodes) the problem of the increasing >>> memory utilization does not resolve. >>> >>> i've read the operations page and also the >>> http://wiki.apache.org/cassandra/MemtableThresholds page. =A0i am >>> running defaults and 0.7.6-02 ... >>> >>> what are the best places to start in terms of finding why this is >>> happening? =A0CF design / usage? =A0'nodetool cfstats' gives me some go= od >>> info ... and i've already implemented some changes to one CF based on >>> how it had ballooned (too many rows versus not enough columns) >>> >>> suggestions appreciated >>> >>> -- >>> Sasha Dolgy >>> sasha.dolgy@gmail.com >>> >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > > > -- > Sasha Dolgy > sasha.dolgy@gmail.com > --=20 Sasha Dolgy sasha.dolgy@gmail.com