Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 84927 invoked from network); 19 Dec 2009 00:28:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 19 Dec 2009 00:28:15 -0000 Received: (qmail 92767 invoked by uid 500); 19 Dec 2009 00:28:15 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 92738 invoked by uid 500); 19 Dec 2009 00:28:15 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 92728 invoked by uid 99); 19 Dec 2009 00:28:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Dec 2009 00:28:15 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bburruss@real.com designates 207.188.23.6 as permitted sender) Received: from [207.188.23.6] (HELO jor-el.real.com) (207.188.23.6) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Dec 2009 00:28:08 +0000 Received: from seacas02.corp.real.com ([::ffff:192.168.139.57]) (TLS: TLSv1/SSLv3,128bits,AES128-SHA) by jor-el.real.com with esmtp; Fri, 18 Dec 2009 16:27:47 -0800 id 00094067.4B2C1E03.00004D45 Received: from seambx.corp.real.com ([fe80::2d15:fda7:b3b8:e268]) by seacas02.corp.real.com ([::1]) with mapi; Fri, 18 Dec 2009 16:27:46 -0800 From: Brian Burruss To: "cassandra-user@incubator.apache.org" Date: Fri, 18 Dec 2009 16:27:46 -0800 Subject: RE: another OOM Thread-Topic: another OOM Thread-Index: AcqAPOF5/mmcur4WRbKoBHJbfLn3LwAAIr4+ Message-ID: <766B5A29D28DA442AB229AAEE2AFC44507D7B91505@SEAMBX.corp.real.com> References: <766B5A29D28DA442AB229AAEE2AFC44507D7B91503@SEAMBX.corp.real.com>, In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 i am simulating load by using two virtual machines (on separate boxes than = the servers) each running an app that spawns 12 threads; 6 threads doing re= ads and 6 threads doing writes. so i have a total of 12 read threads, and = 12 write threads. between each thread's operation it waits 10ms. the writ= e threads are writing a 2k block of data, and the read threads are reading = what is written so every read should return data. right now i'm seeing abo= ut 800 ops/sec total throughput for all clients/servers. if i take the 10m= s delay out, of course it will go faster but seems to burden cassandra too = much. we are trying to prove that cassandra can run and sustain load. we are pla= nning a 10TB system that needs to handle about 10k ops/sec. for my tests i have two machines for servers, each with 16G RAM, 600G 10k S= CSI drive, 2x 2-core CPU (total 4 cores per machine). starting JVM with -X= mx6G. the network is 100Mbits. (this is not how the cluster would look in= prod, but it's all the hardware i have until first of 2010.) cluster contains ~126,281,657 data elements using about 298G on one node's = disk i don't have the commitlog on a separate drive yet. during normal operation, i see the following: - memory is staying fairly low for the size of data, low enough where i did= n't monitor it, but i believe it was less than 3G. - "global" read latency creep up slightly as reported by StorageProxy. - "round trip time on the wire" as reported by my client creeps up at a ste= eper slope then "global" read latency. so there is a discrepancy somewhere= with the stats - i have added another JMX data point to cassandra to measu= re the overall time spent in cassandra - but i got to get the servers star= ted again to see what it reports ;) using node 1 and node 2, simulating a crash of node 1 using kill -9: - node 1 was OOM'ing when trying to restart after a crash, but this seems f= ixed. it is staying cool and quiet - node 2 is now OOM'ing during restart of node 1. memory steadily grows. = last thing i see in log is "Starting up server gossip" until OOM what bothers me the most is not that i'm getting an OOM, but i can't predic= t when i'll get it. the fact that restarting a failed node requires more t= han double the "normal operating" RAM is a bit of a worry. not sure what else to tell you at the moment. lemme know what i can provid= e so we can figure this out. thx! ________________________________________ From: Jonathan Ellis [jbellis@gmail.com] Sent: Friday, December 18, 2009 3:49 PM To: cassandra-user@incubator.apache.org Subject: Re: another OOM It sounds like you're simply throwing too much load at Cassandra. Adding more machines can help. Look at http://wiki.apache.org/cassandra/Operations for how to track metrics that will tell you how much is "too much." Telling us more about your workload would be useful in sanity checking that hypothesis. :) -Jonathan On Fri, Dec 18, 2009 at 4:34 PM, Brian Burruss wrote: > this time i simulated node 1 crashing, waited a few minutes, then restart= ed it. after a while node 2 OOM'ed. > > same 2 node cluster with RF=3D2, W=3D1, R=3D1. i up'ed the RAM to 6G thi= s time. > > cluster contains ~126,281,657 data elements containing about 298G on one = node's disk > > thx!