Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CA6257555 for ; Wed, 24 Aug 2011 00:29:12 +0000 (UTC) Received: (qmail 14089 invoked by uid 500); 24 Aug 2011 00:29:10 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 14008 invoked by uid 500); 24 Aug 2011 00:29:09 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 14000 invoked by uid 99); 24 Aug 2011 00:29:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Aug 2011 00:29:09 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of adi.pandit@gmail.com designates 209.85.161.172 as permitted sender) Received: from [209.85.161.172] (HELO mail-gx0-f172.google.com) (209.85.161.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Aug 2011 00:29:02 +0000 Received: by gxk19 with SMTP id 19so624783gxk.31 for ; Tue, 23 Aug 2011 17:28:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=f4Uh+015UiZEREqzpMquzQM+LsmEO1y0HkUKrLTHg30=; b=OmMbRY9KCuQp9vyB5ZHsFeM3mHcI+EA+Wos72ebNqf1GpZGtWztjN5iILLJPqqomxb TwvrKw53z7FjZn2VKJ0qZmQp63Gi+6XnAzcDx9jIVHR1BcjSxIvpLZ62mD1ost5vo6fl GCiNU8IQO2sGNfvZuKMOMfTtXHV7yTj5/ZQ3U= MIME-Version: 1.0 Received: by 10.101.179.31 with SMTP id g31mr4169255anp.147.1314145722041; Tue, 23 Aug 2011 17:28:42 -0700 (PDT) Received: by 10.100.239.9 with HTTP; Tue, 23 Aug 2011 17:28:42 -0700 (PDT) In-Reply-To: <4E5432F9.9040002@peoplebrowsr.com> References: <4E5432F9.9040002@peoplebrowsr.com> Date: Tue, 23 Aug 2011 20:28:42 -0400 Message-ID: Subject: Re: cassandra unexpected shutdown From: Adi To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org 2011/8/23 Ernst D Schoen-Ren=E9 : > Hi, > =A0I'm running a 16-node cassandra cluster, with a reasonably large amoun= t of > data per node (~1TB). =A0Nodes have 16G ram, but heap is set to 8G. > > The nodes keep stopping with this output in the log. =A0Any ideas? > > ERROR [Thread-85] 2011-08-23 21:00:38,723 AbstractCassandraDaemon.java (l= ine > 113) Fatal exception in thread Thread[Thread-85,5,main] > java.lang.OutOfMemoryError: Java heap space > ERROR [ReadStage:568] 2011-08-23 21:00:38,723 AbstractCassandraDaemon.jav= a > (line 113) Fatal exception in thread Thread[ReadStage:568,5,main] > java.lang.OutOfMemoryError: Java heap space > =A0INFO [HintedHandoff:1] 2011-08-23 21:00:38,720 HintedHandOffManager.ja= va > (line 320) Started hinted handoff for endpoint /10.28.0.184 > =A0INFO [GossipStage:2] 2011-08-23 21:00:50,751 Gossiper.java (line 606) > InetAddress /10.29.20.67 is now UP > ERROR [Thread-34] 2011-08-23 21:00:50,525 AbstractCassandraDaemon.java (l= ine > 113) Fatal exception in thread Thread[Thread-34,5,main] > java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has s= hut > down > =A0 =A0at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedEx= ecution(DebuggableThreadPoolExecutor.java:73) > =A0 =A0at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:76= 7) > =A0 =A0at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:6= 58) > =A0 =A0at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:4= 44) > =A0 =A0at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.= java:117) > ERROR [Thread-36] 2011-08-23 21:00:50,518 AbstractCassandraDaemon.java (l= ine > 113) Fatal exception in thread Thread[Thread-36,5,main] > java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has s= hut > down > =A0 =A0at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedEx= ecution(DebuggableThreadPoolExecutor.java:73) > =A0 =A0at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:76= 7) > =A0 =A0at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:6= 58) > =A0 =A0at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:4= 44) > =A0 =A0at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.= java:117) > =A0INFO [GossipTasks:1] 2011-08-23 21:00:50,466 Gossiper.java (line 620) > InetAddress /10.29.20.67 is now dead. > =A0INFO [HintedHandoff:1] 2011-08-23 21:00:50,751 HintedHandOffManager.ja= va > (line 376) Finished hinted handoff of 0 rows to endpoint /10.28.0.184 > ERROR [Thread-33] 2011-08-23 21:01:05,048 AbstractCassandraDaemon.java (l= ine > 113) Fatal exception in thread Thread[Thread-33,5,main] > java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has s= hut > down > =A0 =A0at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedEx= ecution(DebuggableThreadPoolExecutor.java:73) > =A0 =A0at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:76= 7) > =A0 =A0at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:6= 58) > =A0 =A0at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:4= 44) > =A0 =A0at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.= java:117) > ERROR [Thread-128] 2011-08-23 21:01:05,048 AbstractCassandraDaemon.java > (line 113) Fatal exception in thread Thread[Thread-128,5,main] > java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has s= hut > down > =A0 =A0at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedEx= ecution(DebuggableThreadPoolExecutor.java:73) > =A0 =A0at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:76= 7) > =A0 =A0at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:6= 58) > =A0 =A0at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:4= 44) > =A0 =A0at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.= java:117) > root@cass1:~# > > You can try the cargo cult solution of upping the heap to 12GB and see if the nodes stabilize. We have a 4-node cluster with 2-3 TB data per node and that was the heap at which it the nodes were managing to serve requests without running out of memory. Ultimately we ordered more memory and are running it with 24 GB heap and the cluster has been stable without complains. Other things you can do for reducing memory usage if they are appropriate for your read/write profile: a) reduce memtable throughput(most reduction in mem footprint) b) disable row caching c) reduce/disable key caching(least reduction) Ultimately you will have to tune based on your 1) row sizes 2) read/write load -Adi