incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ran Tavory <ran...@gmail.com>
Subject Re: oom in ROW-MUTATION-STAGE
Date Sat, 22 May 2010 20:05:32 GMT
The message deserializer has 10m pending tasks before the oom. What do you
think makes the message deserializer blow up? I'd suspect that when it goes
up to 10m pending tasks, don't know how much mem a task actually takes up,
but they may consume a lot of memory. Is there a setting I need to tweak?
(or am I barking at the wrong tree?).

I'll add the counters from
http://github.com/jbellis/cassandra-munin-pluginsbut I already have
most of them monitored, so I attached the graphs of the
ones that seemed the most suspicious in the previous email.

The system keyspace and HH CF don't look too bad, I think, here they are:

Keyspace: system
        Read Count: 154
        Read Latency: 0.875012987012987 ms.
        Write Count: 9
        Write Latency: 0.20055555555555554 ms.
        Pending Tasks: 0
                Column Family: LocationInfo
                SSTable count: 1
                Space used (live): 2714
                Space used (total): 2714
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 3
                Read Count: 2
                Read Latency: NaN ms.
                Write Count: 9
                Write Latency: 0.011 ms.
                Pending Tasks: 0
                Key cache capacity: 1
                Key cache size: 1
                Key cache hit rate: NaN
                Row cache: disabled
                Compacted row minimum size: 203
                Compacted row maximum size: 397
                Compacted row mean size: 300

                Column Family: HintsColumnFamily
                SSTable count: 1
                Space used (live): 1457
                Space used (total): 4371
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 0
                Read Count: 152
                Read Latency: 0.369 ms.
                Write Count: 0
                Write Latency: NaN ms.
                Pending Tasks: 0
                Key cache capacity: 1
                Key cache size: 1
                Key cache hit rate: 0.07142857142857142
                Row cache: disabled
                Compacted row minimum size: 829
                Compacted row maximum size: 829
                Compacted row mean size: 829





On Sat, May 22, 2010 at 4:14 AM, Jonathan Ellis <jbellis@gmail.com> wrote:

> Can you monitor cassandra-level metrics like the ones in
> http://github.com/jbellis/cassandra-munin-plugins ?
>
> the usual culprit is usually compaction but your compacted row size is
> small.  nothing else really comes to mind.
>
> (you should check system keyspace too tho, HH rows can get large)
>
> On Fri, May 21, 2010 at 2:36 PM, Ran Tavory <rantav@gmail.com> wrote:
> > I see some OOM on one of the hosts in the cluster and I wonder if there's
> a
> > formula that'll help me calculate what's the required memory setting
> given
> > the parameters x,y,z...
> > In short, I need advice on:
> > 1. How to set up proper heap space and which parameters should I look at
> > when doing so.
> > 2. Help setting up an alert policy and define some counter measures or
> sos
> > steps an admin can take to prevent further degradation of service when
> > alerts fire.
> > The OOM is at the row mutation stage and it happens after extensive GC
> > activity. (log tail below).
> > The server has 16G physical ram and java heap space 4G. No other
> significant
> > processes run on the same server. I actually upped the java heap space to
> 8G
> > but it OOMed again...
> > Most of my settings are the defaults with a few keyspaces and a few CFs
> in
> > each KS. Here's the output of cfstats for the largest and most heavily
> used
> > CF. (currently reads/writes are stopped but data is there).
> > Keyspace: outbrain_kvdb
> >         Read Count: 3392
> >         Read Latency: 160.33135908018866 ms.
> >         Write Count: 2005839
> >         Write Latency: 0.029233923061621595 ms.
> >         Pending Tasks: 0
> >                 Column Family: KvImpressions
> >                 SSTable count: 8
> >                 Space used (live): 21923629878
> >                 Space used (total): 21923629878
> >                 Memtable Columns Count: 69440
> >                 Memtable Data Size: 9719364
> >                 Memtable Switch Count: 26
> >                 Read Count: 3392
> >                 Read Latency: NaN ms.
> >                 Write Count: 1998821
> >                 Write Latency: 0.018 ms.
> >                 Pending Tasks: 0
> >                 Key cache capacity: 200000
> >                 Key cache size: 11661
> >                 Key cache hit rate: NaN
> >                 Row cache: disabled
> >                 Compacted row minimum size: 302
> >                 Compacted row maximum size: 22387
> >                 Compacted row mean size: 641
> > I'm also attaching a few graphs of "the incidenst" I hope they help. From
> > the graphs it looks like:
> > 1. message deserializer pool is behind so maybe taking too much mem. If
> > graphs are correct, it gets as high as 10m pending before crash.
> > 2. row-read-stage has a high number of pending (4k) so first of all -
> this
> > isn't good for performance whether it caused the oom or not, and second,
> > this may also have taken up heap space and caused the crash.
> > Thanks!
> >  INFO [GC inspection] 2010-05-21 00:53:25,885 GCInspector.java (line 110)
> GC
> > for ConcurrentMarkSweep: 10819 ms, 939992 reclaimed leaving 4312064504
> used;
> > max is 4431216640
> >  INFO [GC inspection] 2010-05-21 00:53:44,605 GCInspector.java (line 110)
> GC
> > for ConcurrentMarkSweep: 9672 ms, 673400 reclaimed leaving 4312337208
> used;
> > max is 4431216640
> >  INFO [GC inspection] 2010-05-21 00:54:23,110 GCInspector.java (line 110)
> GC
> > for ConcurrentMarkSweep: 9150 ms, 402072 reclaimed leaving 4312609776
> used;
> > max is 4431216640
> > ERROR [ROW-MUTATION-STAGE:19] 2010-05-21 01:55:37,951
> CassandraDaemon.java
> > (line 88) Fatal exception in thread Thread[ROW-MUTATION-STAGE:19,5,main]
> > java.lang.OutOfMemoryError: Java heap space
> > ERROR [Thread-10] 2010-05-21 01:55:37,951 CassandraDaemon.java (line 88)
> > Fatal exception in thread Thread[Thread-10,5,main]
> > java.lang.OutOfMemoryError: Java heap space
> > ERROR [CACHETABLE-TIMER-2] 2010-05-21 01:55:37,951 CassandraDaemon.java
> > (line 88) Fatal exception in thread Thread[CACHETABLE-TIMER-2,5,main]
> > java.lang.OutOfMemoryError: Java heap space
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Mime
View raw message