Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1E4254015 for ; Tue, 31 May 2011 10:02:46 +0000 (UTC) Received: (qmail 62157 invoked by uid 500); 31 May 2011 10:02:42 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 62133 invoked by uid 500); 31 May 2011 10:02:41 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 62125 invoked by uid 99); 31 May 2011 10:02:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 May 2011 10:02:41 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sdolgy@gmail.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vw0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 May 2011 10:02:34 +0000 Received: by vws12 with SMTP id 12so4378334vws.31 for ; Tue, 31 May 2011 03:02:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:from:date:message-id:subject:to :content-type; bh=LBQK9tqT8HTEAwM0UK6iwf0QjLqKDZPALBTS03tkSWk=; b=BHMTCKw95rwVw1zC4OkfZUPu1ybfoYBEIZ+oJExgc9D+Kr7KkJfFTGT/Ov4x7WjWdT vDMl2FGnUxo+qZS9NZetSKU2MOJs7Jic5pl5mbDNu1ABC6GejVrm/KGysVmSY4KZGwC1 Tq4AnLgStOd3U+y/I4yZ8trWrKZzoPjYW0KO8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; b=FJMk3e+h0H+fWESGmURJJWBrcxHoXumykKAXBgYcSfGhK98S5xvkF0LsgdFk7Xk0g7 engFjDeOd7F5gwFoD11egfHbVxaLveKEjQlaGLDUj3xM0rgNzC6Z2oEhK3Ng4uH4nn2N 7iLPfRkywS6q0/sQ+pboUpfRToNxZI80Y8Ua8= Received: by 10.52.115.163 with SMTP id jp3mr18532vdb.187.1306836133071; Tue, 31 May 2011 03:02:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.157.169 with HTTP; Tue, 31 May 2011 03:01:53 -0700 (PDT) From: Sasha Dolgy Date: Tue, 31 May 2011 12:01:53 +0200 Message-ID: Subject: cascading failures due to memory To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org hi everyone, the current nodes i have deployed (4) have all been working fine, with not a lot of data ... more reads than writes at the moment. as i had monitoring disabled, when one node's OS killed the cassandra process due to out of memory problems ... that was fine. 24 hours later, another node, 24 hours later, another node ...until finally, all 4 nodes no longer had cassandra running. When all nodes are started fresh, CPU utilization is at about 21% on each box. after 24 hours, this goes up to 32% and then 51% 24 hours later. originally I had thought that this may be a result of 'nodetool repair' not being run consistently ... after adding a cronjob to run every 24 hours (staggered between nodes) the problem of the increasing memory utilization does not resolve. i've read the operations page and also the http://wiki.apache.org/cassandra/MemtableThresholds page. i am running defaults and 0.7.6-02 ... what are the best places to start in terms of finding why this is happening? CF design / usage? 'nodetool cfstats' gives me some good info ... and i've already implemented some changes to one CF based on how it had ballooned (too many rows versus not enough columns) suggestions appreciated -- Sasha Dolgy sasha.dolgy@gmail.com