incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ивaн Cобoлeв <sobol...@gmail.com>
Subject Re: Cassandra nodes failing with OOM
Date Mon, 26 Nov 2012 11:51:07 GMT
Hi, all,

thank you very much for the help. Aaron was right - we had a
multiget_count query,
which depending on the app input would result in a calculation performed
for ~40k keys.

We've released the fix and ~100 GCInspector warnings per day per node went
to ~1 per day per 30 nodes :)

Thank you very much!

Ivan

2012/11/19 Viktor Jevdokimov <Viktor.Jevdokimov@adform.com>

>  We've seen OOM in a situation, when OS was not properly prepared in
> production.****
>
> ** **
>
> http://www.datastax.com/docs/1.1/install/recommended_settings****
>
> ** **
>
> ** **
>
> ** **
>    Best regards / Pagarbiai
> *Viktor Jevdokimov*
> Senior Developer
>
> Email: Viktor.Jevdokimov@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider <http://twitter.com/#!/adforminsider>
> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>  [image: Adform News] <http://www.adform.com>
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>   *From:* some.unique.login@gmail.com [mailto:some.unique.login@gmail.com]
> *On Behalf Of *Ивaн Cобoлeв
> *Sent:* Saturday, November 17, 2012 08:08
> *To:* user@cassandra.apache.org
> *Subject:* Cassandra nodes failing with OOM****
>
> ** **
>
> Dear Community, ****
>
> ** **
>
> advice from you needed. ****
>
> ** **
>
> We have a cluster, 1/6 nodes of which died for various reasons(3 had OOM
> message). ****
>
> Nodes died in groups of 3, 1, 2. No adjacent died, though we use
> SimpleSnitch.****
>
> ** **
>
> Version:         1.1.6****
>
> Hardware:      12Gb RAM / 8 cores(virtual)****
>
> Data:              40Gb/node****
>
> Nodes:           36 nodes****
>
> ** **
>
> Keyspaces:    2(RF=3, R=W=2) + 1(OpsCenter)****
>
> CFs:                36, 2 indexes****
>
> Partitioner:      Random****
>
> Compaction:   Leveled(we don't want 2x space for housekeeping)****
>
> Caching:          Keys only****
>
> ** **
>
> All is pretty much standard apart from the one CF receiving writes in 64K
> chunks and having sstable_size_in_mb=100.****
>
> No JNA installed - this is to be fixed soon.****
>
> ** **
>
> Checking sysstat/sar I can see 80-90% CPU idle, no anomalies in io and the
> only change - network activity spiking. ****
>
> All the nodes before dying had the following on logs:****
>
> > INFO [ScheduledTasks:1] 2012-11-15 21:35:05,512 StatusLogger.java (line
> 72) MemtablePostFlusher               1         4         0****
>
> > INFO [ScheduledTasks:1] 2012-11-15 21:35:13,540 StatusLogger.java (line
> 72) FlushWriter                       1         3         0****
>
> > INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line
> 72) HintedHandoff                     1         6         0****
>
> > INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line
> 77) CompactionManager                 5         9****
>
> ** **
>
> GCInspector warnings were there too, they went from ~0.8 to 3Gb heap in
> 5-10mins.****
>
> ** **
>
> So, could you please give me a hint on:****
>
> 1. How much GCInspector warnings per hour are considered 'normal'?****
>
> 2. What should be the next thing to check?****
>
> 3. What are the possible failure reasons and how to prevent those?****
>
> ** **
>
> Thank you very much in advance,****
>
> Ivan****
>

Mime
View raw message