On Sun, May 23, 2010 at 10:59 AM, Ran Tavory <rantav@gmail.com> wrote:
> Is there another solution except adding capacity?
Either you need to get more performance/node or increase node count. :)
> How does the ConcurrentReads (default 8) affect that? If I expect to have
> similar number of reads and writes should I set the ConcurrentReads equal
> to ConcurrentWrites (default 32) ?
You should figure out where the bottleneck is, before tweaking things:
http://spyced.blogspot.com/2010/01/linux-performance-basics.html
Increasing CR will only help if you are (a) cpu bound and (b) have so
many cores that 8 threads isn't saturating them.
Sight unseen, my guess is you are disk bound. iostat can confirm this.
If that's the case then you can try to reduce the disk load w/ row
cache or key cache.
> On Sun, May 23, 2010 at 5:43 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> looks like reads are backing up, which in turn is making deserialize back
>> up
>>
>> On Sun, May 23, 2010 at 4:25 AM, Ran Tavory <rantav@gmail.com> wrote:
>> > Here's tpstats on a server with traffic that I think will get OOM
>> > shortly.
>> > We have 4k pending reads and 123k pending at MESSAGE-DESERIALIZER-POOL
>> > Is there something I can do to prevent that? (other than adding RAM...)
>> > Pool Name Active Pending Completed
>> > FILEUTILS-DELETE-POOL 0 0 55
>> > STREAM-STAGE 0 0
6
>> > RESPONSE-STAGE 0 0
0
>> > ROW-READ-STAGE 8 4088 7537229
>> > LB-OPERATIONS 0 0
0
>> > MESSAGE-DESERIALIZER-POOL 1 123799 22198459
>> > GMFD 0 0
471827
>> > LB-TARGET 0 0
0
>> > CONSISTENCY-MANAGER 0 0 0
>> > ROW-MUTATION-STAGE 0 0 14142351
>> > MESSAGE-STREAMING-POOL 0 0 16
>> > LOAD-BALANCER-STAGE 0 0 0
>> > FLUSH-SORTER-POOL 0 0
0
>> > MEMTABLE-POST-FLUSHER 0 0 128
>> > FLUSH-WRITER-POOL 0 0 128
>> > AE-SERVICE-STAGE 1 1
8
>> > HINTED-HANDOFF-POOL 0 0 10
>> >
>> > On Sat, May 22, 2010 at 11:05 PM, Ran Tavory <rantav@gmail.com> wrote:
>> >>
>> >> The message deserializer has 10m pending tasks before the oom. What do
>> >> you
>> >> think makes the message deserializer blow up? I'd suspect that when it
>> >> goes
>> >> up to 10m pending tasks, don't know how much mem a task actually takes
>> >> up,
>> >> but they may consume a lot of memory. Is there a setting I need to
>> >> tweak?
>> >> (or am I barking at the wrong tree?).
>> >> I'll add the counters
>> >> from http://github.com/jbellis/cassandra-munin-plugins but I already
>> >> have
>> >> most of them monitored, so I attached the graphs of the ones that
>> >> seemed the
>> >> most suspicious in the previous email.
>> >> The system keyspace and HH CF don't look too bad, I think, here they
>> >> are:
>> >> Keyspace: system
>> >> Read Count: 154
>> >> Read Latency: 0.875012987012987 ms.
>> >> Write Count: 9
>> >> Write Latency: 0.20055555555555554 ms.
>> >> Pending Tasks: 0
>> >> Column Family: LocationInfo
>> >> SSTable count: 1
>> >> Space used (live): 2714
>> >> Space used (total): 2714
>> >> Memtable Columns Count: 0
>> >> Memtable Data Size: 0
>> >> Memtable Switch Count: 3
>> >> Read Count: 2
>> >> Read Latency: NaN ms.
>> >> Write Count: 9
>> >> Write Latency: 0.011 ms.
>> >> Pending Tasks: 0
>> >> Key cache capacity: 1
>> >> Key cache size: 1
>> >> Key cache hit rate: NaN
>> >> Row cache: disabled
>> >> Compacted row minimum size: 203
>> >> Compacted row maximum size: 397
>> >> Compacted row mean size: 300
>> >> Column Family: HintsColumnFamily
>> >> SSTable count: 1
>> >> Space used (live): 1457
>> >> Space used (total): 4371
>> >> Memtable Columns Count: 0
>> >> Memtable Data Size: 0
>> >> Memtable Switch Count: 0
>> >> Read Count: 152
>> >> Read Latency: 0.369 ms.
>> >> Write Count: 0
>> >> Write Latency: NaN ms.
>> >> Pending Tasks: 0
>> >> Key cache capacity: 1
>> >> Key cache size: 1
>> >> Key cache hit rate: 0.07142857142857142
>> >> Row cache: disabled
>> >> Compacted row minimum size: 829
>> >> Compacted row maximum size: 829
>> >> Compacted row mean size: 829
>> >>
>> >>
>> >>
>> >>
>> >> On Sat, May 22, 2010 at 4:14 AM, Jonathan Ellis <jbellis@gmail.com>
>> >> wrote:
>> >>>
>> >>> Can you monitor cassandra-level metrics like the ones in
>> >>> http://github.com/jbellis/cassandra-munin-plugins ?
>> >>>
>> >>> the usual culprit is usually compaction but your compacted row size
is
>> >>> small. nothing else really comes to mind.
>> >>>
>> >>> (you should check system keyspace too tho, HH rows can get large)
>> >>>
>> >>> On Fri, May 21, 2010 at 2:36 PM, Ran Tavory <rantav@gmail.com>
wrote:
>> >>> > I see some OOM on one of the hosts in the cluster and I wonder
if
>> >>> > there's a
>> >>> > formula that'll help me calculate what's the required memory setting
>> >>> > given
>> >>> > the parameters x,y,z...
>> >>> > In short, I need advice on:
>> >>> > 1. How to set up proper heap space and which parameters should
I
>> >>> > look
>> >>> > at
>> >>> > when doing so.
>> >>> > 2. Help setting up an alert policy and define some counter measures
>> >>> > or
>> >>> > sos
>> >>> > steps an admin can take to prevent further degradation of service
>> >>> > when
>> >>> > alerts fire.
>> >>> > The OOM is at the row mutation stage and it happens after extensive
>> >>> > GC
>> >>> > activity. (log tail below).
>> >>> > The server has 16G physical ram and java heap space 4G. No other
>> >>> > significant
>> >>> > processes run on the same server. I actually upped the java heap
>> >>> > space
>> >>> > to 8G
>> >>> > but it OOMed again...
>> >>> > Most of my settings are the defaults with a few keyspaces and a
few
>> >>> > CFs
>> >>> > in
>> >>> > each KS. Here's the output of cfstats for the largest and most
>> >>> > heavily
>> >>> > used
>> >>> > CF. (currently reads/writes are stopped but data is there).
>> >>> > Keyspace: outbrain_kvdb
>> >>> > Read Count: 3392
>> >>> > Read Latency: 160.33135908018866 ms.
>> >>> > Write Count: 2005839
>> >>> > Write Latency: 0.029233923061621595 ms.
>> >>> > Pending Tasks: 0
>> >>> > Column Family: KvImpressions
>> >>> > SSTable count: 8
>> >>> > Space used (live): 21923629878
>> >>> > Space used (total): 21923629878
>> >>> > Memtable Columns Count: 69440
>> >>> > Memtable Data Size: 9719364
>> >>> > Memtable Switch Count: 26
>> >>> > Read Count: 3392
>> >>> > Read Latency: NaN ms.
>> >>> > Write Count: 1998821
>> >>> > Write Latency: 0.018 ms.
>> >>> > Pending Tasks: 0
>> >>> > Key cache capacity: 200000
>> >>> > Key cache size: 11661
>> >>> > Key cache hit rate: NaN
>> >>> > Row cache: disabled
>> >>> > Compacted row minimum size: 302
>> >>> > Compacted row maximum size: 22387
>> >>> > Compacted row mean size: 641
>> >>> > I'm also attaching a few graphs of "the incidenst" I hope they
help.
>> >>> > From
>> >>> > the graphs it looks like:
>> >>> > 1. message deserializer pool is behind so maybe taking too much
mem.
>> >>> > If
>> >>> > graphs are correct, it gets as high as 10m pending before crash.
>> >>> > 2. row-read-stage has a high number of pending (4k) so first of
all
>> >>> > -
>> >>> > this
>> >>> > isn't good for performance whether it caused the oom or not, and
>> >>> > second,
>> >>> > this may also have taken up heap space and caused the crash.
>> >>> > Thanks!
>> >>> > INFO [GC inspection] 2010-05-21 00:53:25,885 GCInspector.java
(line
>> >>> > 110) GC
>> >>> > for ConcurrentMarkSweep: 10819 ms, 939992 reclaimed leaving
>> >>> > 4312064504
>> >>> > used;
>> >>> > max is 4431216640
>> >>> > INFO [GC inspection] 2010-05-21 00:53:44,605 GCInspector.java
(line
>> >>> > 110) GC
>> >>> > for ConcurrentMarkSweep: 9672 ms, 673400 reclaimed leaving
>> >>> > 4312337208
>> >>> > used;
>> >>> > max is 4431216640
>> >>> > INFO [GC inspection] 2010-05-21 00:54:23,110 GCInspector.java
(line
>> >>> > 110) GC
>> >>> > for ConcurrentMarkSweep: 9150 ms, 402072 reclaimed leaving
>> >>> > 4312609776
>> >>> > used;
>> >>> > max is 4431216640
>> >>> > ERROR [ROW-MUTATION-STAGE:19] 2010-05-21 01:55:37,951
>> >>> > CassandraDaemon.java
>> >>> > (line 88) Fatal exception in thread
>> >>> > Thread[ROW-MUTATION-STAGE:19,5,main]
>> >>> > java.lang.OutOfMemoryError: Java heap space
>> >>> > ERROR [Thread-10] 2010-05-21 01:55:37,951 CassandraDaemon.java
(line
>> >>> > 88)
>> >>> > Fatal exception in thread Thread[Thread-10,5,main]
>> >>> > java.lang.OutOfMemoryError: Java heap space
>> >>> > ERROR [CACHETABLE-TIMER-2] 2010-05-21 01:55:37,951
>> >>> > CassandraDaemon.java
>> >>> > (line 88) Fatal exception in thread
>> >>> > Thread[CACHETABLE-TIMER-2,5,main]
>> >>> > java.lang.OutOfMemoryError: Java heap space
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Jonathan Ellis
>> >>> Project Chair, Apache Cassandra
>> >>> co-founder of Riptano, the source for professional Cassandra support
>> >>> http://riptano.com
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com
|