incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Max <cassan...@ajowa.de>
Subject Re: Re: Re: Cassandra 0.7 beta 3 outOfMemory (OOM)
Date Tue, 07 Dec 2010 17:15:42 GMT
Thank you Jake, also Aaron & Peter for your help :-)

It was the 1801 bug, solved in RC2 SVN Snapshot!

Max


Jake Luciani <jakers@gmail.com> wrote:

> Max this was a bug fixed recently in 0.7 branch
>
> https://issues.apache.org/jira/browse/CASSANDRA-1801
>
> fixed now in RC2
>
> -Jake
>
> On Tue, Dec 7, 2010 at 8:11 AM, Max <cassandra@ajowa.de> wrote:
>
>> As far as i can see, Lucandra already uses batch_mutations.
>>
>> https://github.com/tjake/Lucandra/blob/master/src/lucandra/IndexWriter.java#L263
>>
>> https://github.com/tjake/Lucandra/blob/master/src/lucandra/CassandraUtils.java#L371
>>
>> IndexWriter.addDocument() merges all fields to a mutioation map.
>> In addition instead of "autoCommit" (commit each doc), i commit only every
>> 10 documents. Where can i monitor incoming requests to cassandra?
>> WriteCount and MutationCount (monitored by jconsole) didn't change
>> obviously.
>>
>> I had problems to open the jrockit heapdump with MAT, but found "jrockit
>> mission control" instead. Unfortunately i'm not confident using it.
>>
>> Here my observations:
>> While heapByteBuffer was growing (~200mb) and flushed during client insert
>> the byte[] was growing permanetly.
>> http://oi51.tinypic.com/2uhbdp3.jpg
>>
>> I used TypeGraph to analyze the byte[] but i'm not sure how to interpret:
>> http://oi53.tinypic.com/y2d1i.jpg
>>
>> Thank you!
>> Max
>>
>>
>> Aaron Morton <aaron@thelastpickle.com> wrote:
>>
>>> Jake or anyone else got experience bulk loading into Lucandra ?
>>>
>>> Or does anyone have experience with JRocket ?
>>>
>>> Max, are you sending one document at a time into lucene. Can you  send
>>> them in batches (like solr), if so does it reduce the
>>> amount of requests going to cassandra?
>>>
>>> Also, cassandra.bat is configured  with XX:+HeapDumpOnOutOfMemoryError so
>>> you should be able to take a  look at where all the memory if   
>>> going. Riptano
>>> blog points  to http://www.eclipse.org/mat/  also  see
>>> http://www.oracle.com/technetwork/java/javase/memleaks-137499.html#gdyrr
>>>
>>> Hope that helps.
>>>
>>> Aaron
>>>
>>> On 07 Dec, 2010,at 09:17 AM, Aaron Morton <aaron@thelastpickle.com>
>>> wrote:
>>>
>>> Accidentally sent to me.
>>>
>>> Begin forwarded message:
>>> From: Max <cassandra@ajowa.de>
>>> Date: 07 December 2010 6:00:36 AM
>>> To: Aaron Morton <aaron@thelastpickle.com>
>>> Subject: Re: Re: Re: Cassandra 0.7 beta 3 outOfMemory (OOM)
>>>
>>> Thank you both for your answer!
>>> After several tests with different parameters we came to the conclusion
>>> that it must be a bug.
>>> It looks very similar to:
>>> https://issues.apache.org/jira/browse/CASSANDRA-1014
>>>
>>> For both CFs we reduced thresholds:
>>> - memtable_flush_after_mins = 60 (both CFs are used permanently,
>>> therefore other thresholds should trigger first)
>>> - memtable_throughput_in_mb = 40
>>> - memtable_operations_in_millions = 0.3
>>> - keys_cached = 0
>>> - rows_cached = 0
>>>
>>> - in_memory_compaction_limit_in_mb = 64
>>>
>>> First we disabled caching, later we disabled compacting and after that we
>>> set
>>> commitlog_sync: batch
>>> commitlog_sync_batch_window_in_ms: 1
>>>
>>> But our problem still appears:
>>> During inserting files with Lucandra memory usage is slowly growing
>>> until OOM crash after about 50 min.
>>> @Peter: In our latest test we stopped writing suddenly but cassandra
>>> didn\'t relax and remains even after minutes on ~90% heap usage.
>>> http://oi54.tinypic.com/2dueeix.jpg
>>>
>>> With our heap calculation we should need:
>>> 64 MB * 2 * 3 + 1 GB = 1,4 GB
>>> All recent tests we run with 3 GB. I think that should be ok for a test
>>> machine.
>>> Also consistency level is one.
>>>
>>> But Aaron is right, Lucandra produces even more than 200 inserts/s.
>>> My 200 documents per second are about 200 operations (writecount) on
>>> first CF and about 3000 on second CF.
>>>
>>> But even with about 120 documents/s cassandra crashes.
>>>
>>>
>>> Disk I/O monitored with Windows performance admin tools is on both
>>> discs moderate (commitlog is on seperate harddisc).
>>>
>>>
>>> Any ideas?
>>> If it's really a bug, in my opinion it's very critical.
>>>
>>>
>>>
>>> Aaron Morton <aaron@thelastpickle.com> wrote:
>>>
>>>  I remember you have 2 CF's but what are the settings for:
>>>>
>>>> - memtable_flush_after_mins
>>>> - memtable_throughput_in_mb
>>>> - memtable_operations_in_millions
>>>> - keys_cached
>>>> - rows_cached
>>>>
>>>> - in_memory_compaction_limit_in_mb
>>>>
>>>> Can you do the JVM Heap Calculation here and see what it says
>>>> http://wiki.apache.org/cassandra/MemtableThresholds
>>>>
>>>> What Consistency Level are you writing at? (Checking  it's not Zero)
>>>>
>>>> When you talk about 200 inserts per second is that storing 200  documents
>>>> through lucandra or 200 request to cassandra. If it's the  first option I
>>>> would assume that would generate a lot more actual  requests into  
>>>>  cassandra.
>>>> Open up jconsole and take a look at the  WriteCount settings for the  CF's
>>>> http://wikiapache.org/cassandra/MemtableThresholds
>>>>
>>>>
>>>> You could also try setting the compaction thresholds to 0 to disable
>>>> compaction while you are pushing this data in. Then use node tool to
>>>> compact and turn the settings back to normal. See cassandra.yam for
>>>> more info.
>>>>
>>>> I would have thought you could get the writes through with the setup
>>>> you've described so far (even though a single 32bit node is unusual).
>>>> The best advice is to turn all the settings down (e.g. caches off,
>>>> mtable flush 64MB, compaction disabled) and if it still fails try:
>>>>
>>>> - checking your IO stats, not sure on windows but JConsole has some IO
>>>> stats. If your IO cannot keep up then your server is not fast enough
>>>> for your client load.
>>>> - reducing the client load
>>>>
>>>> Hope that helps.
>>>> Aaron
>>>>
>>>>
>>>> On 04 Dec, 2010,at 05:23 AM, Max <cassandra@ajowa.de> wrote:
>>>>
>>>> Hi,
>>>>
>>>> we increased heap space to 3 GB (with JRocket VM under 32-bit Win with
>>>> 4 GB RAM)
>>>> but under "heavy" inserts Cassandra is still crashing with OutOfMemory
>>>> error after a GC storm.
>>>>
>>>> It sounds very similar to
>>>> https://issues.apache.org/jira/browse/CASSANDRA-1177
>>>>
>>>> In our insert-tests the average heap usage is slowly growing up to the
>>>> 3 GB border (jconsole monitor over 50 min
>>>> http://oi51.tinypic.com/k12gzd.jpg) and the CompactionManger queue is
>>>> also constantly growing up to about 50 jobs pending
>>>>
>>>> We tried to decrease CF memtable threshold but after about half a
>>>> million inserts it's over.
>>>>
>>>> - Cassandra 0.7.0 beta 3
>>>> - Single Node
>>>> - about 200 inserts/s ~500byte - 1 kb
>>>>
>>>>
>>>> Is there no other possibility instead of slowing down inserts/s ?
>>>>
>>>> What could be an indicator to see if a node works stable with this
>>>> amount of inserts?
>>>>
>>>> Thank you for your answer,
>>>> Max
>>>>
>>>>
>>>> Aaron Morton <aaron@thelastpickle.com>:
>>>>
>>>>  Sounds like you need to increase the Heap size and/or reduce the
>>>>>  memtable_throughput_in_mb and/or turn off the internal caches.  Normally
>>>>> the binary memtable thresholds only apply to bulk load    
>>>>> operations and it's
>>>>> the per CF memtable_* settings you want to  change. I'm not familiar
with
>>>>> lucandra though.
>>>>>
>>>>> See the section on JVM Heap Size here
>>>>> http://wiki.apache.org/cassandra/MemtableThresholds
>>>>>
>>>>> Bottom line is you will need more JVM heap memory.
>>>>>
>>>>> Hope that helps.
>>>>> Aaron
>>>>>
>>>>> On 29 Nov, 2010,at 10:28 PM, cassandra@ajowa.de wrote:
>>>>>
>>>>> Hi community,
>>>>>
>>>>> during my tests i had several OOM crashes.
>>>>> Getting some hints to find out the problem would be nice.
>>>>>
>>>>> First cassandra crashes after about 45 min insert test script.
>>>>> During the following tests time to OOM was shorter until it  started
to
>>>>> crash
>>>>> even in "idle" mode.
>>>>>
>>>>> Here the facts:
>>>>> - cassandra 0.7 beta 3
>>>>> - using lucandra to index about 3 million files ~1kb data
>>>>> - inserting with one client to one cassandra node with about 200 files/s
>>>>> - cassandra data files for this keyspace grow up to about 20 GB
>>>>> - the keyspace only contains the two lucandra specific CFs
>>>>>
>>>>> Cluster:
>>>>> - cassandra single node on windows 32bit, Xeon 2,5 Ghz, 4GB RAM
>>>>> - java jre 1.6.0_22
>>>>> - heap space first 1GB, later increased to 1,3 GB
>>>>>
>>>>> Cassandra.yaml:
>>>>> default + reduced "binary_memtable_throughput_in_mb" to 128
>>>>>
>>>>> CFs:
>>>>> default + reduced
>>>>> min_compaction_threshold: 4
>>>>> max_compaction_threshold: 8
>>>>>
>>>>>
>>>>> I think the problem appears always during compaction,
>>>>> and perhaps it is a result of large rows (some about 170mb).
>>>>>
>>>>> Are there more options we could use to work with few memory?
>>>>>
>>>>> Is it a problem of compaction?
>>>>> And how to avoid?
>>>>> Slower inserts? More memory?
>>>>> Even fewer memtable_throuput or in_memory_compaction_limit?
>>>>> Continuous manual major comapction?
>>>>>
>>>>> I've read
>>>>>
>>>>> http://www.riptano.com/docs/0.6/troubleshooting/index#nodes-are-dying-with-oom-errors
>>>>> - row_size should be fixed since 0.7 and 200mb is still far away from
>>>>> 2gb
>>>>> - only key cache is used a little bit 3600/20000
>>>>> - after a lot of writes cassandra crashes even in idle mode
>>>>> - memtablesize was reduced and there are only 2 CFs
>>>>>
>>>>> Several heapdumps in MAT show 60-99% heapusage of compaction thread.
>>>>>
>>>>
>

Mime
View raw message