Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 815FA1046B for ; Fri, 6 Dec 2013 15:59:15 +0000 (UTC) Received: (qmail 69463 invoked by uid 500); 6 Dec 2013 15:59:12 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 69444 invoked by uid 500); 6 Dec 2013 15:59:12 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 69436 invoked by uid 99); 6 Dec 2013 15:59:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Dec 2013 15:59:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of peichieh@gmail.com designates 209.85.223.180 as permitted sender) Received: from [209.85.223.180] (HELO mail-ie0-f180.google.com) (209.85.223.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Dec 2013 15:59:07 +0000 Received: by mail-ie0-f180.google.com with SMTP id tp5so1523404ieb.39 for ; Fri, 06 Dec 2013 07:58:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=XQVIfrmienF03vg8nqXiZ1THbMIehchtolIJ8Y5yEWw=; b=ljulLVBfYZZPraDvb7eW+co6V6m3OMq2PLCecNUB6OXbAt1imfGwHhgLyxhZIfjndO fJByiG2AwIHdDh2CAUq0p7vENOgMaHibh/MBpa4Ml450N5Q4nAOmWDbISnHbsw8O1/TP vpKf47W4Ujzt/gjDW/ALgZ41j780rajBE4oZPz5uDOcxnuMamZwkmNkbirlfMsvtQaIW nBVYocFUn+/MJ2nA1lm2W+vrJkAU1TkLGNA2uaZudQwoSomawjtRyHlBtvmlH1hTa3Ot tW6v8Cdr0IRrlht3opqBdlqA+RMljZL46eg0t7PZnL8NDEYSFw3Sp1Wd7o3PVZHdplS+ VjKw== MIME-Version: 1.0 X-Received: by 10.43.127.4 with SMTP id gy4mr2849289icc.61.1386345526410; Fri, 06 Dec 2013 07:58:46 -0800 (PST) Received: by 10.64.245.243 with HTTP; Fri, 6 Dec 2013 07:58:46 -0800 (PST) In-Reply-To: References: Date: Fri, 6 Dec 2013 23:58:46 +0800 Message-ID: Subject: Re: Write performance with 1.2.12 From: Jason Wee To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a11c1e384a345c104ecdfb8dd X-Virus-Checked: Checked by ClamAV on apache.org --001a11c1e384a345c104ecdfb8dd Content-Type: text/plain; charset=ISO-8859-1 Hi srmore, Perhaps if you use jconsole and connect to the jvm using jmx. Then uner MBeans tab, start inspecting the GC metrics. /Jason On Fri, Dec 6, 2013 at 11:40 PM, srmore wrote: > > > > On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak wrote: > >> Hard to say much without knowing about the cassandra configurations. >> > > The cassandra configuration is > -Xms8G > -Xmx8G > -Xmn800m > -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled > -XX:SurvivorRatio=4 > -XX:MaxTenuringThreshold=2 > -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly > > > >> Yes compactions/GC's could skipe the CPU, I had similar behavior with my >> setup. >> > > Were you able to get around it ? > > >> >> -VK >> >> >> On Fri, Dec 6, 2013 at 7:40 PM, srmore wrote: >> >>> We have a 3 node cluster running cassandra 1.2.12, they are pretty big >>> machines 64G ram with 16 cores, cassandra heap is 8G. >>> >>> The interesting observation is that, when I send traffic to one node its >>> performance is 2x more than when I send traffic to all the nodes. We ran >>> 1.0.11 on the same box and we observed a slight dip but not half as seen >>> with 1.2.12. In both the cases we were writing with LOCAL_QUORUM. Changing >>> CL to ONE make a slight improvement but not much. >>> >>> The read_Repair_chance is 0.1. We see some compactions running. >>> >>> following is my iostat -x output, sda is the ssd (for commit log) and >>> sdb is the spinner. >>> >>> avg-cpu: %user %nice %system %iowait %steal %idle >>> 66.46 0.00 8.95 0.01 0.00 24.58 >>> >>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz >>> avgqu-sz await svctm %util >>> sda 0.00 27.60 0.00 4.40 0.00 256.00 >>> 58.18 0.01 2.55 1.32 0.58 >>> sda1 0.00 0.00 0.00 0.00 0.00 0.00 >>> 0.00 0.00 0.00 0.00 0.00 >>> sda2 0.00 27.60 0.00 4.40 0.00 256.00 >>> 58.18 0.01 2.55 1.32 0.58 >>> sdb 0.00 0.00 0.00 0.00 0.00 0.00 >>> 0.00 0.00 0.00 0.00 0.00 >>> sdb1 0.00 0.00 0.00 0.00 0.00 0.00 >>> 0.00 0.00 0.00 0.00 0.00 >>> dm-0 0.00 0.00 0.00 0.00 0.00 0.00 >>> 0.00 0.00 0.00 0.00 0.00 >>> dm-1 0.00 0.00 0.00 0.60 0.00 4.80 >>> 8.00 0.00 5.33 2.67 0.16 >>> dm-2 0.00 0.00 0.00 0.00 0.00 0.00 >>> 0.00 0.00 0.00 0.00 0.00 >>> dm-3 0.00 0.00 0.00 24.80 0.00 198.40 >>> 8.00 0.24 9.80 0.13 0.32 >>> dm-4 0.00 0.00 0.00 6.60 0.00 52.80 >>> 8.00 0.01 1.36 0.55 0.36 >>> dm-5 0.00 0.00 0.00 0.00 0.00 0.00 >>> 0.00 0.00 0.00 0.00 0.00 >>> dm-6 0.00 0.00 0.00 24.80 0.00 198.40 >>> 8.00 0.29 11.60 0.13 0.32 >>> >>> >>> >>> I can see I am cpu bound here but couldn't figure out exactly what is >>> causing it, is this caused by GC or Compaction ? I am thinking it is >>> compaction, I see a lot of context switches and interrupts in my vmstat >>> output. >>> >>> I don't see GC activity in the logs but see some compaction activity. >>> Has anyone seen this ? or know what can be done to free up the CPU. >>> >>> Thanks, >>> Sandeep >>> >>> >>> >> > --001a11c1e384a345c104ecdfb8dd Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi srmore,

Perhaps if you use jco= nsole and connect to the jvm using jmx. Then uner MBeans tab, start inspect= ing the GC metrics.

/Jason


On Fri, Dec 6, 2013 at 11:40 PM, srmore = <comomore@gmail.com> wrote:



On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak <vic= ky.kak@gmail.com> wrote:
Hard to say much without knowing about the cassandra configurations.
<= /div>
=A0
The cassandra configuration is= =A0
-Xms8G
-Xmx8G
-Xmn800m
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=3D4
-XX:MaxTenuringThreshold=3D2
-XX:CMSInitiatingOccupancyFraction=3D75
-XX:+UseCMSInitiatingOccupancyOnly

=A0
=
Yes compactions/GC's could skipe the CPU, I had similar behavior with m= y setup.

Were you a= ble to get around it ?
=A0

-VK


On Fri, Dec 6, 2013 at 7:40 PM, srmore <= span dir=3D"ltr"><comomore@gmail.com> wrote:
We have a 3 node cluster running cassandra 1.2.1= 2, they are pretty big machines 64G ram with 16 cores, cassandra heap is 8G= .

The interesting observation is that, when I send traffic to= one node its performance is 2x more than when I send traffic to all the no= des. We ran 1.0.11 on the same box and we observed a slight dip but not hal= f as seen with 1.2.12. In both the cases we were writing with LOCAL_QUORUM.= Changing CL to ONE make a slight improvement but not much.

The read_Repair_chance is 0.1. We see some compactions running.
following is my iostat -x output, sda is the ssd = (for commit log) and sdb is the spinner.

avg-cpu:=A0 %user=A0=A0 %ni= ce %system %iowait=A0 %steal=A0=A0 %idle
=A0=A0=A0=A0=A0=A0=A0=A0=A0 66.46=A0=A0=A0 0.00=A0=A0=A0 8.95=A0=A0=A0 0.01= =A0=A0=A0 0.00=A0=A0 24.58

Device:=A0=A0=A0=A0=A0=A0=A0=A0 rrqm/s=A0= =A0 wrqm/s=A0=A0 r/s=A0=A0 w/s=A0=A0 rsec/s=A0=A0 wsec/s avgrq-sz avgqu-sz= =A0=A0 await=A0 svctm=A0 %util
sda=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0 0.00=A0=A0=A0 27.60=A0 0.00=A0 4.40=A0=A0=A0=A0 0.00=A0=A0 256.00=A0= =A0=A0 58.18=A0=A0=A0=A0 0.01=A0=A0=A0 2.55=A0=A0 1.32=A0=A0 0.58
sda1=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00= =A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.0= 0=A0=A0=A0 0.00=A0=A0 0.00=A0=A0 0.00
sda2=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0 0.00=A0=A0=A0 27.60=A0 0.00=A0 4.40=A0=A0=A0=A0 0.00=A0=A0 256.00= =A0=A0=A0 58.18=A0=A0=A0=A0 0.01=A0=A0=A0 2.55=A0=A0 1.32=A0=A0 0.58
sdb= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00=A0= 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00= =A0=A0=A0 0.00=A0=A0 0.00=A0=A0 0.00
sdb1=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00= =A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.0= 0=A0=A0=A0 0.00=A0=A0 0.00=A0=A0 0.00
dm-0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0= =A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0 0.00=A0=A0 0.00=A0=A0 0= .00
dm-1=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0= 0.00=A0 0.60=A0=A0=A0=A0 0.00=A0=A0=A0=A0 4.80=A0=A0=A0=A0 8.00=A0=A0=A0= =A0 0.00=A0=A0=A0 5.33=A0=A0 2.67=A0=A0 0.16
dm-2=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00= =A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.0= 0=A0=A0=A0 0.00=A0=A0 0.00=A0=A0 0.00
dm-3=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00 24.80=A0=A0=A0=A0 0.00=A0=A0 198.40= =A0=A0=A0=A0 8.00=A0=A0=A0=A0 0.24=A0=A0=A0 9.80=A0=A0 0.13=A0=A0 0.32
d= m-4=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00=A0= 6.60=A0=A0=A0=A0 0.00=A0=A0=A0 52.80=A0=A0=A0=A0 8.00=A0=A0=A0=A0 0.01=A0= =A0=A0 1.36=A0=A0 0.55=A0=A0 0.36
dm-5=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00= =A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.0= 0=A0=A0=A0 0.00=A0=A0 0.00=A0=A0 0.00
dm-6=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00 24.80=A0=A0=A0=A0 0.00=A0=A0 198.40= =A0=A0=A0=A0 8.00=A0=A0=A0=A0 0.29=A0=A0 11.60=A0=A0 0.13=A0=A0 0.32


I can see I am cpu bound here but couldn't figure out exactl= y what is causing it, is this caused by GC or Compaction ? I am thinking it= is compaction, I see a lot of context switches and interrupts in my vmstat= output.

I don't see GC activity in the logs but see some compact= ion activity. Has anyone seen this ? or know what can be done to free up th= e CPU.

Thanks,
Sandeep





--001a11c1e384a345c104ecdfb8dd--