Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
From: Shu Zhang <szhang@mediosystems.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Mon, 25 Apr 2011 13:08:23 -0700
Subject: RE: OOM on heavy write load
Thread-Topic: OOM on heavy write load
Thread-Index: AcwDQ1x4vNyJNKUpRvayjh+C+Av8iwAP1q3oAAA0zSg=
Message-ID: 
 <F92FBB1CFC0C294E9BED0D3324BB6D750190FE2498@mbx2.hostedexchange.local>
References: <148621303477253@web19.yandex.ru>
 <BANLkTi=UrzUy=UEarGY3RciFzBB5yAoEDQ@mail.gmail.com>,<634081303734063@web116.yandex.ru>,<F92FBB1CFC0C294E9BED0D3324BB6D750190FE2497@mbx2.hostedexchange.local>
In-Reply-To: 
 <F92FBB1CFC0C294E9BED0D3324BB6D750190FE2497@mbx2.hostedexchange.local>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="koi8-r"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

the way I measure actual memtable row sizes is this

write X rows into a cassandra node
trigger GC
record heap usage
trigger compaction and GC
record heap savings and divide by X for actual cassandra memtable row size =
in memory

Similar process to measure per-key/per-row cache sizes for your data. To un=
derstand your memtable row overhead size, you can do the above exercise wit=
h very different data sizes.

Also, you probably know this, but when setting your memory usage ceiling or=
 heap size, make sure to leave a few hundred MBs for GC.
________________________________________
From: Shu Zhang [szhang@mediosystems.com]
Sent: Monday, April 25, 2011 12:55 PM
To: user@cassandra.apache.org
Subject: RE: OOM on heavy write load

How large are your rows? binary_memtable_throughput_in_
mb only tracks size of data, but there is an overhead associated with each =
row on the order of magnitude of a few KBs. If your row data sizes are real=
ly small then the overhead dominates the memory usage and binary_memtable_t=
hroughput_in_
mb end up not limiting your memory usage the way you'd expect. It's a good =
idea to specify memtable_operations_in_millions in that case. If you're not=
 sure how big your data is compared to memtable overhead, it's a good idea =
to specify both parameters to effectively put in a memory ceiling no matter=
 which dominates your actual memory usage.

It could also be that your key cache is too big, you should measure your ke=
y sizes and make sure you have enough memory to cache 1m keys (along with y=
our memtables). Finally if you have multiple keyspaces (for multiple applic=
ations) on your cluster, they all share the total available heap, so you ha=
ve to account for that.

There's no measure in cassandra to guard against OOM, you must configure no=
des such that the max memory usage on each node, that is max size all your =
caches and memtables can potentially grow to, is less than your heap size.
________________________________________
From: Nikolay K=CFvshov [nkovshov@yandex.ru]
Sent: Monday, April 25, 2011 5:21 AM
To: user@cassandra.apache.org
Subject: Re: OOM on heavy write load

I assume if I turn off swap it will just die earlier, no ? What is the mech=
anism of dying ?

>From the link you provided

# Row cache is too large, or is caching large rows
my row_cache is 0

# The memtable sizes are too large for the amount of heap allocated to the =
JVM
Is my memtable size too large ? I have made it less to surely fit the "magi=
cal formula"

Trying to analyze heap dumps gives me the following:

In one case diagram has 3 Memtables about 64 Mb each + 72 Mb "Thread" + 700=
 Mb "Unreachable objects"

suspected threats:
7 instances of "org.apache.cassandra.db.Memtable", loaded by "sun.misc.Laun=
cher$AppClassLoader @ 0x7f29f4992d68" occupy 456,292,912 (48.36%) bytes.
25,211 instances of "org.apache.cassandra.io.sstable.SSTableReader", loaded=
 by "sun.misc.Launcher$AppClassLoader @ 0x7f29f4992d68" occupy 294,908,984 =
(31.26%) byte
72 instances of "java.lang.Thread", loaded by "<system class loader>" occup=
y 143,632,624 (15.22%) bytes.


In other cases memory analyzer hangs trying to parse 2Gb dump


22.04.2011, 17:26, "Jonathan Ellis" <jbellis@gmail.com>;:

>  (0) turn off swap
>  (1) http://www.datastax.com/docs/0.7/troubleshooting/index#nodes-are-dyi=
ng-with-oom-errors
>
>  On Fri, Apr 22, 2011 at 8:00 AM, Nikolay K=CFvshov <nkovshov@yandex.ru>;=
; wrote:
>>   I am using Cassandra 0.7.0 with following settings
>>
>>   binary_memtable_throughput_in_mb: 64
>>   in_memory_compaction_limit_in_mb: 64
>>   keys_cached 1 million
>>   rows_cached 0
>>
>>   RAM for Cassandra 2 GB
>>
>>   I run very simple test
>>
>>   1 Node with 4 HDDs (1 HDD - commitlog and caches, 3 HDDs - data)
>>   1 KS =3D> 1 CF =3D> 1 Column
>>
>>   I insert data (random key 64 bytes + value 64 bytes) at a maximum poss=
ible speed, trying to hit disk i/o, calculate speed and make sure Cassandra=
 stays alive. It doesn't, unfortunately.
>>   After several hundreds millions of inserts Cassandra always does down =
by OOM. Getting it up again doesn't help - after inserting some new data it=
 goes down again. By this time Cassandra goes to swap and has a lot of task=
s pending. I am not inserting anything now and tasks sloooowly disappear, b=
ut it will take her weeks to do all of them.
>>
>>   compaction type: Minor
>>   column family: Standard1
>>   bytes compacted: 3661003227
>>   bytes total in progress: 4176296448
>>   pending tasks: 630
>>
>>   So, what am I (or Cassandra) doing wrong ? I dont want to get Cassandr=
a crashed without means of repair on heavy load circumstances.
>  --
>  Jonathan Ellis
>  Project Chair, Apache Cassandra
>  co-founder of DataStax, the source for professional Cassandra support
>  http://www.datastax.com