Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 37440 invoked from network); 7 Dec 2010 13:11:54 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 7 Dec 2010 13:11:54 -0000 Received: (qmail 59533 invoked by uid 500); 7 Dec 2010 13:11:53 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 59342 invoked by uid 500); 7 Dec 2010 13:11:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 59334 invoked by uid 99); 7 Dec 2010 13:11:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Dec 2010 13:11:52 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [194.116.187.5] (HELO turboconrad.planet-school.de) (194.116.187.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Dec 2010 13:11:47 +0000 Received: from turboconrad.planet-school.de (turbolabarbara.planet-school.de [194.116.187.109]) by turboconrad.planet-school.de (Postfix) with ESMTP id 6DCD5B6A022 for ; Tue, 7 Dec 2010 14:11:22 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.2.0 (2007-05-01) on turbolabarbara.planet-school.de X-Spam-Level: Received: from localhost (unknown [194.116.187.50]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: cassandra@ajowa.de) by turboconrad.planet-school.de (Postfix) with ESMTPSA id 11E9EB6A006 for ; Tue, 7 Dec 2010 14:11:21 +0100 (CET) Received: from mail.seeburger.de (mail.seeburger.de [62.156.183.195]) by webmail.planet-school.de (Horde MIME library) with HTTP; Tue, 07 Dec 2010 14:11:21 +0100 Message-ID: <20101207141121.nvd7y3wpkc80ow80@webmail.planet-school.de> Date: Tue, 07 Dec 2010 14:11:21 +0100 From: Max To: user@cassandra.apache.org Subject: Re: Re: Re: Cassandra 0.7 beta 3 outOfMemory (OOM) References: <8bd6f189-1b57-6d24-fa9a-f1d7d1df7c25@me.com> In-Reply-To: <8bd6f189-1b57-6d24-fa9a-f1d7d1df7c25@me.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: Internet Messaging Program (IMP) H3 (4.1.4) X-Virus-Scanned: ClamAV using ClamSMTP X-Old-Spam-Status: No, score=-102.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, LOCAL_USER_RULE,MIME_QP_LONG_LINE autolearn=ham version=3.2.0 As far as i can see, Lucandra already uses batch_mutations. https://github.com/tjake/Lucandra/blob/master/src/lucandra/IndexWriter.java#= L263 https://github.com/tjake/Lucandra/blob/master/src/lucandra/CassandraUtils.ja= va#L371 IndexWriter.addDocument() merges all fields to a mutioation map. In addition instead of "autoCommit" (commit each doc), i commit only =20 every 10 documents. Where can i monitor incoming requests to cassandra? WriteCount and MutationCount (monitored by jconsole) didn't change obviously= . I had problems to open the jrockit heapdump with MAT, but found =20 "jrockit mission control" instead. Unfortunately i'm not confident =20 using it. Here my observations: While heapByteBuffer was growing (~200mb) and flushed during client =20 insert the byte[] was growing permanetly. http://oi51.tinypic.com/2uhbdp3.jpg I used TypeGraph to analyze the byte[] but i'm not sure how to interpret: http://oi53.tinypic.com/y2d1i.jpg Thank you! Max Aaron Morton wrote: > Jake or anyone else got experience bulk loading into Lucandra ?=A0 > > Or does anyone have experience with JRocket ?=A0 > > Max, are you sending one document at a time into lucene. Can you =20 > send them in batches (like solr), if so does it reduce the=A0 > amount of requests going to cassandra?=A0 > > Also, cassandra.bat is configured =20 > with=A0XX:+HeapDumpOnOutOfMemoryError so you should be able to take a =20 > look at where all the memory if going. Riptano blog points =20 > to=A0http://www.eclipse.org/mat/=A0=A0also =20 > see=A0http://www.oracle.com/technetwork/java/javase/memleaks-137499.html#g= dyrr > > Hope that helps.=A0 > > Aaron > > On 07 Dec, 2010,at 09:17 AM, Aaron Morton wrote: > > Accidentally=A0sent to me. > > Begin forwarded message: > From: Max > Date: 07 December 2010 6:00:36 AM > To: Aaron Morton > Subject: Re: Re: Re: Cassandra 0.7 beta 3 outOfMemory (OOM) > > Thank you both for your answer! > After several tests with different parameters we came to the conclusion > that it must be a bug. > It looks very similar to: =20 > https://issues.apache.org/jira/browse/CASSANDRA-1014 > > For both CFs we reduced thresholds: > - memtable_flush_after_mins =3D 60 (both CFs are used permanently, > therefore other thresholds should trigger first) > - memtable_throughput_in_mb =3D 40 > - memtable_operations_in_millions =3D 0.3 > - keys_cached =3D 0 > - rows_cached =3D 0 > > - in_memory_compaction_limit_in_mb =3D 64 > > First we disabled caching, later we disabled compacting and after that we = set > commitlog_sync: batch > commitlog_sync_batch_window_in_ms: 1 > > But our problem still appears: > During inserting files with Lucandra memory usage is slowly growing > until OOM crash after about 50 min. > @Peter: In our latest test we stopped writing suddenly but cassandra > didn\'t relax and remains even after minutes on ~90% heap usage. > http://oi54.tinypic.com/2dueeix.jpg > > With our heap calculation we should need: > 64 MB * 2 * 3 + 1 GB =3D 1,4 GB > All recent tests we run with 3 GB. I think that should be ok for a test > machine. > Also consistency level is one. > > But Aaron is right, Lucandra produces even more than 200 inserts/s. > My 200 documents per second are about 200 operations (writecount) on > first CF and about 3000 on second CF. > > But even with about 120 documents/s cassandra crashes. > > > Disk I/O monitored with Windows performance admin tools is on both > discs moderate (commitlog is on seperate harddisc). > > > Any ideas? > If it's really a bug, in my opinion it's very critical. > > > > Aaron Morton wrote: > >> I remember you have 2 CF's but what are the settings for:=A0 >> >> - memtable_flush_after_mins >> -=A0memtable_throughput_in_mb >> -=A0memtable_operations_in_millions >> -=A0keys_cached >> -=A0rows_cached >> >> -=A0in_memory_compaction_limit_in_mb >> >> Can you do the JVM Heap Calculation here and see what it says >> http://wiki.apache.org/cassandra/MemtableThresholds >> >> What Consistency Level are you writing at? (Checking =A0it's not Zero)=A0 >> >> When you talk about 200 inserts per second is that storing 200 =20 >> documents through lucandra or 200 request to cassandra. If it's the =20 >> first option I would assume that would generate a lot more actual =20 >> requests into cassandra. Open up jconsole and take a look at the =20 >> WriteCount settings for the =20 >> CF's=A0http://wikiapache.org/cassandra/MemtableThresholds >> >> You could also try setting the compaction thresholds to 0 to disable >> compaction while you are pushing this data in. Then use node tool to >> compact and turn the settings back to normal. See cassandra.yam for >> more info. >> >> I would have thought you could get the writes through with the setup >> you've described so far (even though a single 32bit node is unusual). >> The best advice is to turn all the settings down (e.g. caches off, >> mtable flush 64MB, compaction disabled) and if it still fails try: >> >> - checking your IO stats, not sure on windows but JConsole has some IO >> stats. If your IO cannot keep up then your server is not fast enough >> for your client load. >> - reducing the client load >> >> Hope that helps.=A0 >> Aaron >> >> >> On 04 Dec, 2010,at 05:23 AM, Max wrote: >> >> Hi, >> >> we increased heap space to 3 GB (with JRocket VM under 32-bit Win with >> 4 GB RAM) >> but under "heavy" inserts Cassandra is still crashing with OutOfMemory >> error after a GC storm. >> >> It sounds very similar to =20 >> https://issues.apache.org/jira/browse/CASSANDRA-1177 >> >> In our insert-tests the average heap usage is slowly growing up to the >> 3 GB border (jconsole monitor over 50 min >> http://oi51.tinypic.com/k12gzd.jpg) and the CompactionManger queue is >> also constantly growing up to about 50 jobs pending >> >> We tried to decrease CF memtable threshold but after about half a >> million inserts it's over. >> >> - Cassandra 0.7.0 beta 3 >> - Single Node >> - about 200 inserts/s ~500byte - 1 kb >> >> >> Is there no other possibility instead of slowing down inserts/s ? >> >> What could be an indicator to see if a node works stable with this >> amount of inserts? >> >> Thank you for your answer, >> Max >> >> >> Aaron Morton : >> >>> Sounds like you need to increase the Heap size and/or reduce the =20 >>> memtable_throughput_in_mb and/or turn off the internal caches. =20 >>> Normally the binary memtable thresholds only apply to bulk load =20 >>> operations and it's the per CF memtable_* settings you want to =20 >>> change. I'm not=A0familiar=A0with lucandra though.=A0 >>> >>> See the section on JVM Heap Size here=A0 >>> http://wiki.apache.org/cassandra/MemtableThresholds >>> >>> Bottom line is you will need more JVM heap memory. >>> >>> Hope that helps. >>> Aaron >>> >>> On 29 Nov, 2010,at 10:28 PM, cassandra@ajowa.de wrote: >>> >>> Hi community, >>> >>> during my tests i had several OOM crashes. >>> Getting some hints to find out the problem would be nice. >>> >>> First cassandra crashes after about 45 min insert test script. >>> During the following tests time to OOM was shorter until it =20 >>> started to crash >>> even in "idle" mode. >>> >>> Here the facts: >>> - cassandra 0.7 beta 3 >>> - using lucandra to index about 3 million files ~1kb data >>> - inserting with one client to one cassandra node with about 200 files/s >>> - cassandra data files for this keyspace grow up to about 20 GB >>> - the keyspace only contains the two lucandra specific CFs >>> >>> Cluster: >>> - cassandra single node on windows 32bit, Xeon 2,5 Ghz, 4GB RAM >>> - java jre 1.6.0_22 >>> - heap space first 1GB, later increased to 1,3 GB >>> >>> Cassandra.yaml: >>> default + reduced "binary_memtable_throughput_in_mb" to 128 >>> >>> CFs: >>> default + reduced >>> min_compaction_threshold: 4 >>> max_compaction_threshold: 8 >>> >>> >>> I think the problem appears always during compaction, >>> and perhaps it is a result of large rows (some about 170mb). >>> >>> Are there more options we could use to work with few memory? >>> >>> Is it a problem of compaction? >>> And how to avoid? >>> Slower inserts? More memory? >>> Even fewer memtable_throuput or in_memory_compaction_limit? >>> Continuous manual major comapction? >>> >>> I've read >>> http://www.riptano.com/docs/0.6/troubleshooting/index#nodes-are-dying-wi= th-oom-errors >>> - row_size should be fixed since 0.7 and 200mb is still far away from 2g= b >>> - only key cache is used a little bit 3600/20000 >>> - after a lot of writes cassandra crashes even in idle mode >>> - memtablesize was reduced and there are only 2 CFs >>> >>> Several heapdumps in MAT show 60-99% heapusage of compaction thread.