Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cassandra-user@incubator.apache.org
Received-SPF: pass (athena.apache.org: domain of bburruss@real.com designates
 207.188.23.6 as permitted sender)
From: Brian Burruss <bburruss@real.com>
To: 
  "cassandra-user@incubator.apache.org" <cassandra-user@incubator.apache.org>
Date: Fri, 18 Dec 2009 16:27:46 -0800
Subject: RE: another OOM
Thread-Topic: another OOM
Thread-Index: AcqAPOF5/mmcur4WRbKoBHJbfLn3LwAAIr4+
Message-ID: <766B5A29D28DA442AB229AAEE2AFC44507D7B91505@SEAMBX.corp.real.com>
References: 
 <766B5A29D28DA442AB229AAEE2AFC44507D7B91503@SEAMBX.corp.real.com>,<e06563880912181549m3f43f498mcfc01067963af66a@mail.gmail.com>
In-Reply-To: <e06563880912181549m3f43f498mcfc01067963af66a@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

i am simulating load by using two virtual machines (on separate boxes than =
the servers) each running an app that spawns 12 threads; 6 threads doing re=
ads and 6 threads doing writes.  so i have a total of 12 read threads, and =
12 write threads.  between each thread's operation it waits 10ms.  the writ=
e threads are writing a 2k block of data, and the read threads are reading =
what is written so every read should return data.  right now i'm seeing abo=
ut 800 ops/sec total throughput for all clients/servers.  if i take the 10m=
s delay out, of course it will go faster but seems to burden cassandra too =
much.

we are trying to prove that cassandra can run and sustain load.  we are pla=
nning a 10TB system that needs to handle about 10k ops/sec.

for my tests i have two machines for servers, each with 16G RAM, 600G 10k S=
CSI drive, 2x 2-core CPU (total 4 cores per machine).  starting JVM with -X=
mx6G.  the network is 100Mbits.  (this is not how the cluster would look in=
 prod, but it's all the hardware i have until first of 2010.)

cluster contains ~126,281,657 data elements using about 298G on one node's =
disk

i don't have the commitlog on a separate drive yet.

during normal operation, i see the following:

- memory is staying fairly low for the size of data, low enough where i did=
n't monitor it, but i believe it was less than 3G.
- "global" read latency creep up slightly as reported by StorageProxy.
- "round trip time on the wire" as reported by my client creeps up at a ste=
eper slope then "global" read latency.  so there is a discrepancy somewhere=
 with the stats - i have added another JMX data point to cassandra to measu=
re the overall time spent in cassandra -  but i got to get the servers star=
ted again to see what it reports ;)

using node 1 and node 2, simulating a crash of node 1 using kill -9:

- node 1 was OOM'ing when trying to restart after a crash, but this seems f=
ixed.  it is staying cool and quiet
- node 2 is now OOM'ing during restart of node 1.  memory steadily grows.  =
last thing i see in log is "Starting up server gossip" until OOM

what bothers me the most is not that i'm getting an OOM, but i can't predic=
t when i'll get it.  the fact that restarting a failed node requires more t=
han double the "normal operating" RAM is a bit of a worry.

not sure what else to tell you at the moment.  lemme know what i can provid=
e so we can figure this out.

thx!

________________________________________
From: Jonathan Ellis [jbellis@gmail.com]
Sent: Friday, December 18, 2009 3:49 PM
To: cassandra-user@incubator.apache.org
Subject: Re: another OOM

It sounds like you're simply throwing too much load at Cassandra.
Adding more machines can help.  Look at
http://wiki.apache.org/cassandra/Operations for how to track metrics
that will tell you how much is "too much."

Telling us more about your workload would be useful in sanity checking
that hypothesis. :)

-Jonathan

On Fri, Dec 18, 2009 at 4:34 PM, Brian Burruss <bburruss@real.com> wrote:
> this time i simulated node 1 crashing, waited a few minutes, then restart=
ed it.  after a while node 2 OOM'ed.
>
> same 2 node cluster with RF=3D2, W=3D1, R=3D1.  i up'ed the RAM to 6G thi=
s time.
>
> cluster contains ~126,281,657 data elements containing about 298G on one =
node's disk
>
> thx!