Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 26789 invoked from network); 10 Aug 2010 00:12:13 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Aug 2010 00:12:13 -0000 Received: (qmail 83107 invoked by uid 500); 10 Aug 2010 00:12:12 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 83048 invoked by uid 500); 10 Aug 2010 00:12:11 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 83040 invoked by uid 99); 10 Aug 2010 00:12:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Aug 2010 00:12:11 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.210.44] (HELO mail-pz0-f44.google.com) (209.85.210.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Aug 2010 00:12:03 +0000 Received: by pzk6 with SMTP id 6so4546416pzk.31 for ; Mon, 09 Aug 2010 17:11:41 -0700 (PDT) Received: by 10.143.26.4 with SMTP id d4mr14147762wfj.338.1281399101168; Mon, 09 Aug 2010 17:11:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.142.242.16 with HTTP; Mon, 9 Aug 2010 17:11:21 -0700 (PDT) In-Reply-To: References: From: Benjamin Black Date: Mon, 9 Aug 2010 17:11:21 -0700 Message-ID: Subject: Re: Growing commit log directory. To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org what does the io load look like on those nodes? On Mon, Aug 9, 2010 at 1:50 PM, Edward Capriolo wro= te: > I have a 16 node 6.3 cluster and two nodes from my cluster are giving > me major headaches. > > 10.71.71.56 =A0 Up =A0 =A0 =A0 =A0 58.19 GB > 108271662202116783829255556910108067277 =A0 =A0| =A0 ^ > 10.71.71.61 =A0 Down =A0 =A0 =A0 67.77 GB > 123739042516704895804863493611552076888 =A0 =A0v =A0 | > 10.71.71.66 =A0 Up =A0 =A0 =A0 =A0 43.51 GB > 127605887595351923798765477786913079296 =A0 =A0| =A0 ^ > 10.71.71.59 =A0 Down =A0 =A0 =A0 90.22 GB > 139206422831293007780471430312996086499 =A0 =A0v =A0 | > 10.71.71.65 =A0 Up =A0 =A0 =A0 =A0 22.97 GB > 148873535527910577765226390751398592512 =A0 =A0| =A0 ^ > > The symptoms I am seeing are nodes 61 and nodes 59 have huge 6 GB + > commit log directories. They keep growing, along with memory usage, > eventually the logs start showing GCInspection errors and then the > nodes will go OOM > > INFO 14:20:01,296 Creating new commitlog segment > /var/lib/cassandra/commitlog/CommitLog-1281378001296.log > =A0INFO 14:20:02,199 GC for ParNew: 327 ms, 57545496 reclaimed leaving > 7955651792 used; max is 9773776896 > =A0INFO 14:20:03,201 GC for ParNew: 443 ms, 45124504 reclaimed leaving > 8137412920 used; max is 9773776896 > =A0INFO 14:20:04,314 GC for ParNew: 438 ms, 54158832 reclaimed leaving > 8310139720 used; max is 9773776896 > =A0INFO 14:20:05,547 GC for ParNew: 409 ms, 56888760 reclaimed leaving > 8480136592 used; max is 9773776896 > =A0INFO 14:20:06,900 GC for ParNew: 441 ms, 58149704 reclaimed leaving > 8648872520 used; max is 9773776896 > =A0INFO 14:20:08,904 GC for ParNew: 462 ms, 59185992 reclaimed leaving > 8816581312 used; max is 9773776896 > =A0INFO 14:20:09,973 GC for ParNew: 460 ms, 57403840 reclaimed leaving > 8986063136 used; max is 9773776896 > =A0INFO 14:20:11,976 GC for ParNew: 447 ms, 59814376 reclaimed leaving > 9153134392 used; max is 9773776896 > =A0INFO 14:20:13,150 GC for ParNew: 441 ms, 61879728 reclaimed leaving > 9318140296 used; max is 9773776896 > java.lang.OutOfMemoryError: Java heap space > Dumping heap to java_pid10913.hprof ... > =A0INFO 14:22:30,620 InetAddress /10.71.71.66 is now dead. > =A0INFO 14:22:30,621 InetAddress /10.71.71.65 is now dead. > =A0INFO 14:22:30,621 GC for ConcurrentMarkSweep: 44862 ms, 261200 > reclaimed leaving 9334753480 used; max is 9773776896 > =A0INFO 14:22:30,621 InetAddress /10.71.71.64 is now dead. > > Heap dump file created [12730501093 bytes in 253.445 secs] > ERROR 14:28:08,945 Uncaught exception in thread Thread[Thread-2288,5,main= ] > java.lang.OutOfMemoryError: Java heap space > =A0 =A0 =A0 =A0at org.apache.cassandra.net.IncomingTcpConnection.run(Inco= mingTcpConnection.java:71) > ERROR 14:28:08,948 Uncaught exception in thread Thread[Thread-2281,5,main= ] > java.lang.OutOfMemoryError: Java heap space > =A0 =A0 =A0 =A0at org.apache.cassandra.net.IncomingTcpConnection.run(Inco= mingTcpConnection.java:71) > =A0INFO 14:28:09,017 GC for ConcurrentMarkSweep: 33737 ms, 85880 > reclaimed leaving 9335215296 used; max is 9773776896 > > Does anyone have any ideas what is going on? >