Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 477BB2380 for ; Mon, 25 Apr 2011 20:08:32 +0000 (UTC) Received: (qmail 4272 invoked by uid 500); 25 Apr 2011 20:08:30 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 4180 invoked by uid 500); 25 Apr 2011 20:08:30 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 4172 invoked by uid 99); 25 Apr 2011 20:08:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Apr 2011 20:08:30 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=FSL_RU_URL,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.50.2.38] (HELO ht1.hostedexchange.local) (69.50.2.38) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Apr 2011 20:08:24 +0000 Received: from mbx2.hostedexchange.local ([172.16.69.30]) by ht2.hostedexchange.local ([172.16.69.40]) with mapi; Mon, 25 Apr 2011 13:08:24 -0700 From: Shu Zhang To: "user@cassandra.apache.org" Date: Mon, 25 Apr 2011 13:08:23 -0700 Subject: RE: OOM on heavy write load Thread-Topic: OOM on heavy write load Thread-Index: AcwDQ1x4vNyJNKUpRvayjh+C+Av8iwAP1q3oAAA0zSg= Message-ID: References: <148621303477253@web19.yandex.ru> ,<634081303734063@web116.yandex.ru>, In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org the way I measure actual memtable row sizes is this write X rows into a cassandra node trigger GC record heap usage trigger compaction and GC record heap savings and divide by X for actual cassandra memtable row size = in memory Similar process to measure per-key/per-row cache sizes for your data. To un= derstand your memtable row overhead size, you can do the above exercise wit= h very different data sizes. Also, you probably know this, but when setting your memory usage ceiling or= heap size, make sure to leave a few hundred MBs for GC. ________________________________________ From: Shu Zhang [szhang@mediosystems.com] Sent: Monday, April 25, 2011 12:55 PM To: user@cassandra.apache.org Subject: RE: OOM on heavy write load How large are your rows? binary_memtable_throughput_in_ mb only tracks size of data, but there is an overhead associated with each = row on the order of magnitude of a few KBs. If your row data sizes are real= ly small then the overhead dominates the memory usage and binary_memtable_t= hroughput_in_ mb end up not limiting your memory usage the way you'd expect. It's a good = idea to specify memtable_operations_in_millions in that case. If you're not= sure how big your data is compared to memtable overhead, it's a good idea = to specify both parameters to effectively put in a memory ceiling no matter= which dominates your actual memory usage. It could also be that your key cache is too big, you should measure your ke= y sizes and make sure you have enough memory to cache 1m keys (along with y= our memtables). Finally if you have multiple keyspaces (for multiple applic= ations) on your cluster, they all share the total available heap, so you ha= ve to account for that. There's no measure in cassandra to guard against OOM, you must configure no= des such that the max memory usage on each node, that is max size all your = caches and memtables can potentially grow to, is less than your heap size. ________________________________________ From: Nikolay K=CFvshov [nkovshov@yandex.ru] Sent: Monday, April 25, 2011 5:21 AM To: user@cassandra.apache.org Subject: Re: OOM on heavy write load I assume if I turn off swap it will just die earlier, no ? What is the mech= anism of dying ? >From the link you provided # Row cache is too large, or is caching large rows my row_cache is 0 # The memtable sizes are too large for the amount of heap allocated to the = JVM Is my memtable size too large ? I have made it less to surely fit the "magi= cal formula" Trying to analyze heap dumps gives me the following: In one case diagram has 3 Memtables about 64 Mb each + 72 Mb "Thread" + 700= Mb "Unreachable objects" suspected threats: 7 instances of "org.apache.cassandra.db.Memtable", loaded by "sun.misc.Laun= cher$AppClassLoader @ 0x7f29f4992d68" occupy 456,292,912 (48.36%) bytes. 25,211 instances of "org.apache.cassandra.io.sstable.SSTableReader", loaded= by "sun.misc.Launcher$AppClassLoader @ 0x7f29f4992d68" occupy 294,908,984 = (31.26%) byte 72 instances of "java.lang.Thread", loaded by "" occup= y 143,632,624 (15.22%) bytes. In other cases memory analyzer hangs trying to parse 2Gb dump 22.04.2011, 17:26, "Jonathan Ellis" ;: > (0) turn off swap > (1) http://www.datastax.com/docs/0.7/troubleshooting/index#nodes-are-dyi= ng-with-oom-errors > > On Fri, Apr 22, 2011 at 8:00 AM, Nikolay K=CFvshov ;= ; wrote: >> I am using Cassandra 0.7.0 with following settings >> >> binary_memtable_throughput_in_mb: 64 >> in_memory_compaction_limit_in_mb: 64 >> keys_cached 1 million >> rows_cached 0 >> >> RAM for Cassandra 2 GB >> >> I run very simple test >> >> 1 Node with 4 HDDs (1 HDD - commitlog and caches, 3 HDDs - data) >> 1 KS =3D> 1 CF =3D> 1 Column >> >> I insert data (random key 64 bytes + value 64 bytes) at a maximum poss= ible speed, trying to hit disk i/o, calculate speed and make sure Cassandra= stays alive. It doesn't, unfortunately. >> After several hundreds millions of inserts Cassandra always does down = by OOM. Getting it up again doesn't help - after inserting some new data it= goes down again. By this time Cassandra goes to swap and has a lot of task= s pending. I am not inserting anything now and tasks sloooowly disappear, b= ut it will take her weeks to do all of them. >> >> compaction type: Minor >> column family: Standard1 >> bytes compacted: 3661003227 >> bytes total in progress: 4176296448 >> pending tasks: 630 >> >> So, what am I (or Cassandra) doing wrong ? I dont want to get Cassandr= a crashed without means of repair on heavy load circumstances. > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com