Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0D62AEC20 for ; Wed, 30 Jan 2013 20:08:03 +0000 (UTC) Received: (qmail 43285 invoked by uid 500); 30 Jan 2013 20:08:00 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 43255 invoked by uid 500); 30 Jan 2013 20:08:00 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 43247 invoked by uid 99); 30 Jan 2013 20:08:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jan 2013 20:08:00 +0000 X-ASF-Spam-Status: No, hits=1.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of btalbot@aeriagames.com designates 74.125.149.203 as permitted sender) Received: from [74.125.149.203] (HELO na3sys009aog110.obsmtp.com) (74.125.149.203) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 30 Jan 2013 20:07:53 +0000 Received: from mail-bk0-f70.google.com ([209.85.214.70]) (using TLSv1) by na3sys009aob110.postini.com ([74.125.148.12]) with SMTP ID DSNKUQl9guRwt+sJR8bzmyJs9zyRxZdOidjv@postini.com; Wed, 30 Jan 2013 12:07:32 PST Received: by mail-bk0-f70.google.com with SMTP id jk7so2212517bkc.9 for ; Wed, 30 Jan 2013 12:07:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aeriagames.com; s=google; h=x-received:mime-version:x-received:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=TsAW1xlaTZsG1SBMW3NMdKXdn3S9xcI2GIbzrJ3L1dQ=; b=TUEAbHUiMiYdi9QwPf7DCbSurXFqaTyglkhqnrMsBiXhJEyntsvNJ4uoShfTP+59ht Irkk5x7k2Z8MQqiG9838u87d8SjHhWFhyUO0hfkLeOyVhZqDfeiwathwQPztOy8iN0On I56BOJb4tCc4Fd7ZfJ6D+n1r7yHjzkpj92wmc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-received:in-reply-to:references:date :message-id:subject:from:to:content-type:x-gm-message-state; bh=TsAW1xlaTZsG1SBMW3NMdKXdn3S9xcI2GIbzrJ3L1dQ=; b=nmwSdOg3PvWbvxXM2JpvZivneoO6WsUvrecitgnrbKuzhr8iNX9xB11rn2I05VyDQA BIw5DfXT26LoWc6jHKb3w3ZBSBbcrmxFCIz1d44mQtbub6ZA65zv4/a6HhAAbtS2KW3M sbccSFpbCgsRYdCDgiNEOMPq1JkF1XV4WmoqNx8Hp57yVrLRx0zPQD0ufPF5IdBS6oYt 9GFBSUKrAY2Pb4wS2v/Xed2/1dix/5flprDyAdDcvM6QVVFY86mIN2EnBNpEQ5iWiVcL cUqgl+OaRbotrrIDpxBvfDHWGAMucrXxttamwyVCPSibFw7/mNHowUK6fBem312qXBd7 ePkQ== X-Received: by 10.152.109.210 with SMTP id hu18mr5606339lab.12.1359576449391; Wed, 30 Jan 2013 12:07:29 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.152.109.210 with SMTP id hu18mr5606307lab.12.1359576448748; Wed, 30 Jan 2013 12:07:28 -0800 (PST) Received: by 10.114.17.164 with HTTP; Wed, 30 Jan 2013 12:07:28 -0800 (PST) In-Reply-To: References: Date: Wed, 30 Jan 2013 12:07:28 -0800 Message-ID: Subject: Re: too many warnings of Heap is full From: Bryan Talbot To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=bcaec54eeec245ce0f04d4870f49 X-Gm-Message-State: ALoCoQmChOFe0K94BqdB30GfXImd1kKQN96PUMGKZn8w5jd1u+lov7AqsScE5jAYm2R42Zkwx1jV+rV0XsMnqjGMBhu27MHnvPLfxG1UsapP4y1N57Zdl4BJx9G05LQH3vZJ/xg33QeH9mybRvgj+dvT1FW76q74UiZQ1o2By2seWhn0G0jSabM= X-Virus-Checked: Checked by ClamAV on apache.org --bcaec54eeec245ce0f04d4870f49 Content-Type: text/plain; charset=UTF-8 My guess is that those one or two nodes with the gc pressure also have more rows in your big CF. More rows could be due to imbalanced distribution if your'e not using a random partitioner or from those nodes not yet removing deleted rows which other nodes may have done. JVM heap space is used for a few things which scale with key count including: - bloom filter (for C* < 1.2) - index samples Other space is used but can be more easily controlled by tuning for - memtable - compaction - key cache - row cache So, if those nodes have more rows (check using "nodetool ring" or "nodetool cfstats") than the others you can try to: - reduce the number of rows by adding nodes, run manual / tune compactions to remove rows with expired tombstones, etc. - increase bloom filter fp chance - increase jvm heap size (don't go too big) - disable key or row cache - increase index sample interval Not all of those things are generally good especially to the extreme so don't go setting a 20 GB jvm heap without understanding the consequences for example. -Bryan On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero < guillermo.barbero@spotbros.com> wrote: > Hi, > > I'm viewing a weird behaviour in my cassandra cluster. Most of the > warning messages are due to Heap is % full. According to this link > ( > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html > ) > there are two ways to "reduce pressure": > 1. Decrease the cache sizes > 2. Increase the index interval size > > Most of the flushes are in two column families (users and messages), I > guess that's because the most mutations are there. > > I still have not applied those changes to the production environment. > Do you recommend any other meassure? Should I set specific tunning for > these two CFs? Should I check another metric? > > Additionally, the distribution of warning messages is not uniform > along the cluster. Why could cassandra be doing this? What should I do > to find out how to fix this? > > cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2) > the java version is the following: > java version "1.6.0_37" > Java(TM) SE Runtime Environment (build 1.6.0_37-b06) > Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode) > > The cassandra system.log is resumed here (numer of messages, cassandra > node, class that reports the message, first word of the message) > 2013-01-26 > 5 cassNode0: GCInspector.java Heap > 5 cassNode0: StorageService.java Flushing > 232 cassNode2: GCInspector.java Heap > 232 cassNode2: StorageService.java Flushing > 104 cassNode3: GCInspector.java Heap > 104 cassNode3: StorageService.java Flushing > 3 cassNode4: GCInspector.java Heap > 3 cassNode4: StorageService.java Flushing > 3 cassNode5: GCInspector.java Heap > 3 cassNode5: StorageService.java Flushing > > 2013-01-27 > 2 cassNode0: GCInspector.java Heap > 2 cassNode0: StorageService.java Flushing > 3 cassNode1: GCInspector.java Heap > 3 cassNode1: StorageService.java Flushing > 189 cassNode2: GCInspector.java Heap > 189 cassNode2: StorageService.java Flushing > 104 cassNode3: GCInspector.java Heap > 104 cassNode3: StorageService.java Flushing > 1 cassNode4: GCInspector.java Heap > 1 cassNode4: StorageService.java Flushing > 1 cassNode5: GCInspector.java Heap > 1 cassNode5: StorageService.java Flushing > > 2013-01-28 > 2 cassNode0: GCInspector.java Heap > 2 cassNode0: StorageService.java Flushing > 1 cassNode1: GCInspector.java Heap > 1 cassNode1: StorageService.java Flushing > 1 cassNode2: AutoSavingCache.java Reducing > 343 cassNode2: GCInspector.java Heap > 342 cassNode2: StorageService.java Flushing > 181 cassNode3: GCInspector.java Heap > 181 cassNode3: StorageService.java Flushing > 4 cassNode4: GCInspector.java Heap > 4 cassNode4: StorageService.java Flushing > 3 cassNode5: GCInspector.java Heap > 3 cassNode5: StorageService.java Flushing > > 2013-01-29 > 2 cassNode0: GCInspector.java Heap > 2 cassNode0: StorageService.java Flushing > 3 cassNode1: GCInspector.java Heap > 3 cassNode1: StorageService.java Flushing > 156 cassNode2: GCInspector.java Heap > 156 cassNode2: StorageService.java Flushing > 71 cassNode3: GCInspector.java Heap > 71 cassNode3: StorageService.java Flushing > 2 cassNode4: GCInspector.java Heap > 2 cassNode4: StorageService.java Flushing > 2 cassNode5: GCInspector.java Heap > 1 cassNode5: Memtable.java setting > 2 cassNode5: StorageService.java Flushing > > -- > > Guillermo Barbero - Backend Team > > Spotbros Technologies > --bcaec54eeec245ce0f04d4870f49 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
My guess is that those one or two nodes with the gc pressu= re also have more rows in your big CF. =C2=A0More rows could be due to imba= lanced distribution if your'e not using a random partitioner or from th= ose nodes not yet removing deleted rows which other nodes may have done.
JVM heap space is used for a few things which scale wi= th key count including:
- bloom filter (for C* < 1.2)
- index samples

Other sp= ace is used but can be more=C2=A0easily=C2=A0controlled by tuning for
- memtable
- compaction
- key ca= che
- row cache


So, if those nodes have more rows (check using "nodetoo= l ring" or "nodetool cfstats") than the others you can try t= o:
- reduce the number of rows by adding nodes, run manual / tune c= ompactions to remove rows with expired tombstones, etc.
- i= ncrease bloom filter fp chance
- increase jvm heap size (do= n't go too big)
- disable key or row cache
- increase index samp= le interval

Not all of those things ar= e generally good especially to the extreme so don't go setting a 20 GB = jvm heap without understanding the=C2=A0consequences=C2=A0for example.

-Bryan


On Wed, Jan 30, 2013 at 3:4= 7 AM, Guillermo Barbero <guillermo.barbero@spotbros.com&g= t; wrote:
Hi,

=C2=A0 I'm viewing a weird behaviour in my cassandra cluster. Most of t= he
warning messages are due to Heap is % full. According to this link
(http://cassandr= a-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-t= t7323457.html)
there are two ways to "reduce pressure":
1. Decrease the cache sizes
2. Increase the index interval size

Most of the flushes are in two column families (users and messages), I
guess that's because the most mutations are there.

I still have not applied those changes to the production environment.
Do you recommend any other meassure? Should I set specific tunning for
these two CFs? Should I check another metric?

Additionally, the distribution of warning messages is not uniform
along the cluster. Why could cassandra be doing this? What should I do
to find out how to fix this?

cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
the java version is the following:
java version "1.6.0_37"
Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)

The cassandra system.log is resumed here (numer of messages, cassandra
node, class that reports the message, first word of the message)
2013-01-26
=C2=A0 =C2=A0 =C2=A0 5 cassNode0: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A0 5 cassNode0: StorageService.java Flushing
=C2=A0 =C2=A0 232 cassNode2: GCInspector.java Heap
=C2=A0 =C2=A0 232 cassNode2: StorageService.java Flushing
=C2=A0 =C2=A0 104 cassNode3: GCInspector.java Heap
=C2=A0 =C2=A0 104 cassNode3: StorageService.java Flushing
=C2=A0 =C2=A0 =C2=A0 3 cassNode4: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A0 3 cassNode4: StorageService.java Flushing
=C2=A0 =C2=A0 =C2=A0 3 cassNode5: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A0 3 cassNode5: StorageService.java Flushing

2013-01-27
=C2=A0 =C2=A0 =C2=A0 2 cassNode0: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A0 2 cassNode0: StorageService.java Flushing
=C2=A0 =C2=A0 =C2=A0 3 cassNode1: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A0 3 cassNode1: StorageService.java Flushing
=C2=A0 =C2=A0 189 cassNode2: GCInspector.java Heap
=C2=A0 =C2=A0 189 cassNode2: StorageService.java Flushing
=C2=A0 =C2=A0 104 cassNode3: GCInspector.java Heap
=C2=A0 =C2=A0 104 cassNode3: StorageService.java Flushing
=C2=A0 =C2=A0 =C2=A0 1 cassNode4: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A0 1 cassNode4: StorageService.java Flushing
=C2=A0 =C2=A0 =C2=A0 1 cassNode5: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A0 1 cassNode5: StorageService.java Flushing

2013-01-28
=C2=A0 =C2=A0 =C2=A0 2 cassNode0: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A0 2 cassNode0: StorageService.java Flushing
=C2=A0 =C2=A0 =C2=A0 1 cassNode1: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A0 1 cassNode1: StorageService.java Flushing
=C2=A0 =C2=A0 =C2=A0 1 cassNode2: AutoSavingCache.java Reducing
=C2=A0 =C2=A0 343 cassNode2: GCInspector.java Heap
=C2=A0 =C2=A0 342 cassNode2: StorageService.java Flushing
=C2=A0 =C2=A0 181 cassNode3: GCInspector.java Heap
=C2=A0 =C2=A0 181 cassNode3: StorageService.java Flushing
=C2=A0 =C2=A0 =C2=A0 4 cassNode4: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A0 4 cassNode4: StorageService.java Flushing
=C2=A0 =C2=A0 =C2=A0 3 cassNode5: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A0 3 cassNode5: StorageService.java Flushing

2013-01-29
=C2=A0 =C2=A0 =C2=A0 2 cassNode0: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A0 2 cassNode0: StorageService.java Flushing
=C2=A0 =C2=A0 =C2=A0 3 cassNode1: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A0 3 cassNode1: StorageService.java Flushing
=C2=A0 =C2=A0 156 cassNode2: GCInspector.java Heap
=C2=A0 =C2=A0 156 cassNode2: StorageService.java Flushing
=C2=A0 =C2=A0 =C2=A071 cassNode3: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A071 cassNode3: StorageService.java Flushing
=C2=A0 =C2=A0 =C2=A0 2 cassNode4: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A0 2 cassNode4: StorageService.java Flushing
=C2=A0 =C2=A0 =C2=A0 2 cassNode5: GCInspector.java Heap
=C2=A0 =C2=A0 =C2=A0 1 cassNode5: Memtable.java setting
=C2=A0 =C2=A0 =C2=A0 2 cassNode5: StorageService.java Flushing

--

Guillermo Barbero - Backend Team

Spotbros Technologies


--bcaec54eeec245ce0f04d4870f49--