Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E42D310612 for ; Wed, 30 Oct 2013 09:35:25 +0000 (UTC) Received: (qmail 92770 invoked by uid 500); 30 Oct 2013 09:35:23 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 92594 invoked by uid 500); 30 Oct 2013 09:35:22 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 92578 invoked by uid 99); 30 Oct 2013 09:35:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Oct 2013 09:35:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ares.tang@gmail.com designates 209.85.223.177 as permitted sender) Received: from [209.85.223.177] (HELO mail-ie0-f177.google.com) (209.85.223.177) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Oct 2013 09:35:16 +0000 Received: by mail-ie0-f177.google.com with SMTP id e14so1754739iej.22 for ; Wed, 30 Oct 2013 02:34:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=43LcdQqHM4XbvZifh5QdTFXCRZpXF6RLmNA6fc/heWk=; b=phgMnG2MlmIQMi3MY1YLqz8l9KM4305zt81tmMbL8pdjg7eZI+8H07p2DZ5ZwWytOf pLq8KW0IGuuj8qZAs6x27EfroF7pbT1tyWx+CNjUH84f3cIjORmo1knuKAyW19lpl38x 149Iy4GCKCbaabI1LogmNXKtoLxCZ2MlX6fZ5tC/H2G8s9UKrUi9UuKc13CWzcS6FNFf 2IOZaLmL5xQRtanjGm71GFkvPT6e1ZkNg5TMtgl9nqFkYSThB9wQcUvT4HtLxelE8BAE 8n2kCw77Ks4iENZtQ1rwtccn/S6tD1N4btK/8UgPaMamptGhrMsjxe06AGhK+n8X3o46 qnqw== MIME-Version: 1.0 X-Received: by 10.50.114.168 with SMTP id jh8mr1661704igb.6.1383125695022; Wed, 30 Oct 2013 02:34:55 -0700 (PDT) Received: by 10.50.74.163 with HTTP; Wed, 30 Oct 2013 02:34:54 -0700 (PDT) In-Reply-To: <52705226.9070408@gmail.com> References: <52705226.9070408@gmail.com> Date: Wed, 30 Oct 2013 17:34:54 +0800 Message-ID: Subject: Re: heap issues - looking for advices on gc tuning From: Jason Tang To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=089e01229f86bb4d9e04e9f20b5d X-Virus-Checked: Checked by ClamAV on apache.org --089e01229f86bb4d9e04e9f20b5d Content-Type: text/plain; charset=ISO-8859-1 What's configuration of following parameters memtable_flush_queue_size: concurrent_compactors: 2013/10/30 Piavlo > Hi, > > Below I try to give a full picture to the problem I'm facing. > > This is a 12 node cluster, running on ec2 with m2.xlarge instances (17G > ram , 2 cpus). > Cassandra version is 1.0.8 > Cluster normally having between 3000 - 1500 reads per second (depends on > time of the day) and 1700 - 800 writes per second- according to Opscetner. > RF=3, now row caches are used. > > Memory relevant configs from cassandra.yaml: > flush_largest_memtables_at: 0.85 > reduce_cache_sizes_at: 0.90 > reduce_cache_capacity_to: 0.75 > commitlog_total_space_in_mb: 4096 > > relevant JVM options used are: > -Xms8000M -Xmx8000M -Xmn400M > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled > -XX:MaxTenuringThreshold=1 > -XX:**CMSInitiatingOccupancyFraction**=80 -XX:+** > UseCMSInitiatingOccupancyOnly" > > Now what happens is that with these settings after cassandra process > restart, the GC it working fine at the beginning, and heap used looks like a > saw with perfect teeth, eventually the teeth size start to diminish until > the teeth become not noticable, and then cassandra starts to spend lot's of > CPU time > doing gc. It takes about 2 weeks until for such cycle , and then I need to > restart cassandra process to improve performance. > During all this time there are no memory related messages in cassandra > system.log, except a "GC for ParNew: little above 200ms" once in a while. > > Things i've already done trying to reduce this eventual heap pressure. > 1) reducing bloom_filter_fp_chance resulting in reduction from ~700MB to > ~280MB total per node based on all Filter.db files on the node. > 2) reducing key cache sizes, and dropping key_caches for CFs which do no > not have many reads > 3) the heap size was increased from 7000M to 8000M > All these have not really helped , just the increase from 7000M to 8000M, > helped in increase the cycle till excessive gc from ~9 days to ~14 days. > > I've tried to graph overtime the data that is supposed to be in heap vs > actual heap size, by summing up all CFs bloom filter sizes + all CFs key > cache capacities multipled by average key size + all CFs memtables data > size reported (i've overestimated the data size a bit on purpose to be on > the safe size). > Here is a link to graph showing last 2 day metrics for a node which could > not effectively do GC, and then cassandra process was restarted. > http://awesomescreenshot.com/**0401w5y534 > You can clearly see that before and after restart, the size of data that > is in supposed to be in heap, is the same pretty much the same, > which makes me think that I really need is GC tunning. > > Also I suppose that this is not due to number of total keys each node has > , which is between 300 - 200 milions keys for all CF key estimates summed > on a code. > The nodes have datasize between 75G to 45G accordingly to milions of > keys. And all nodes are starting to have having GC heavy load after about > 14 days. > Also the excessive GC and heap usage are not affected by load which varies > depending on time of the day (see read/write rates at the beginning of the > mail). > So again based on this , I assume this is not due to large number of keys > or too much load on the cluster, but due to a pure GC misconfiguration > issue. > > Things I remember that I've tried for GC tunning: > 1) Changing -XX:MaxTenuringThreshold=1 to values like 8 - did not help. > 2) Adding -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:** > CMSIncrementalDutyCycleMin=0 > -XX:CMSIncrementalDutyCycle=10 -XX:ParallelGCThreads=2 > JVM_OPTS -XX:ParallelCMSThreads=1 > this actually made things worse. > 3) Adding -XX:-XX-UseAdaptiveSizePolicy -XX:SurvivorRatio=8 - did not help. > > Also since it takes like 2 weeks to verify that changing GC setting did > not help, the process is painfully slow to try all the possibilities :) > I'd highly appreciate any help and hints on the GC tunning. > > tnx > Alex > > > > > > > --089e01229f86bb4d9e04e9f20b5d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
What's configuration of following parametersmemtable_flush_queue_size:
concurrent_compactors:=A0


2013/10/30 = Piavlo <lolitushka@gmail.com>
Hi,

Below I try to give a full picture to the problem I'm facing.

This is a 12 node cluster, running on ec2 with m2.xlarge instances (17G ram= , 2 cpus).
Cassandra version is 1.0.8
Cluster normally having between 3000 - 1500 reads per second (depends on ti= me of the day) and 1700 - 800 writes per second- according to Opscetner. RF=3D3, now row caches are used.

Memory relevant =A0configs from cassandra.yaml:
flush_largest_memtables_at: 0.85
reduce_cache_sizes_at: 0.90
reduce_cache_capacity_to: 0.75
commitlog_total_space_in_mb: 4096

relevant JVM options used are:
-Xms8000M -Xmx8000M -Xmn400M
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:= MaxTenuringThreshold=3D1
-XX:CMSInitiatingOccupancyFraction=3D80 -XX:+UseCMSIni= tiatingOccupancyOnly"

Now what happens is that with these settings after cassandra process restar= t, the GC it working fine at the beginning, and heap used looks like a
saw with perfect teeth, eventually the teeth size start to diminish until t= he teeth become not noticable, and then cassandra starts to spend lot's= of CPU time
doing gc. It takes about 2 weeks until for such cycle , and then I need to = restart cassandra process to improve performance.
During all this time there are no memory =A0related messages in cassandra s= ystem.log, except a "GC for ParNew: little above 200ms" once in a= while.

Things i've already done trying to reduce this eventual heap pressure.<= br> 1) reducing bloom_filter_fp_chance =A0resulting in reduction from ~700MB to= ~280MB total per node based on all Filter.db files on the node.
2) reducing key cache sizes, and dropping key_caches for CFs which do no no= t have many reads
3) the heap size was increased from 7000M to 8000M
All these have not really helped , just the increase from 7000M to 8000M, h= elped in increase the cycle till excessive gc from ~9 days to ~14 days.

I've tried to graph overtime the data that is supposed to be in heap vs= actual heap size, by summing up all CFs bloom filter sizes + all CFs key c= ache capacities multipled by average key size + all CFs memtables data size= reported (i've overestimated the data size a bit on purpose to be on t= he safe size).
Here is a link to graph showing last 2 day metrics for a node which could n= ot effectively do GC, and then cassandra process was restarted.
http:= //awesomescreenshot.com/0401w5y534
You can clearly see that before and after restart, the size of data that is= in supposed to be in heap, is the same pretty much the same,
which makes me think that I really need is GC tunning.

Also I suppose that this is not due to number of total keys each node has ,= which is between 300 - 200 milions keys for all CF key estimates summed on= a code.
The nodes have datasize between 75G to 45G =A0accordingly to milions of key= s. And all nodes are starting to have having GC heavy load after about 14 d= ays.
Also the excessive GC and heap usage are not affected by load which varies = depending on time of the day (see read/write rates at the beginning of the = mail).
So again based on this , I assume this is not due to large number of keys o= r too much load on the cluster, =A0but due to a pure GC misconfiguration is= sue.

Things I remember that I've tried for GC tunning:
1) Changing -XX:MaxTenuringThreshold=3D1 to values like 8 - did not help. 2) Adding =A0-XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:C= MSIncrementalDutyCycleMin=3D0
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -XX:CMSIncrementalDutyCycle=3D10 -XX:Pa= rallelGCThreads=3D2 JVM_OPTS -XX:ParallelCMSThreads=3D1
=A0 =A0 this actually made things worse.
3) Adding -XX:-XX-UseAdaptiveSizePolicy -XX:SurvivorRatio=3D8 - did not hel= p.

Also since it takes like 2 weeks to verify that changing GC setting did not= help, the process is painfully slow to try all the possibilities :)
I'd highly appreciate any help and hints on the GC tunning.

tnx
Alex







--089e01229f86bb4d9e04e9f20b5d--