Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8587D10722 for ; Fri, 6 Dec 2013 17:15:43 +0000 (UTC) Received: (qmail 40694 invoked by uid 500); 6 Dec 2013 17:15:39 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 40610 invoked by uid 500); 6 Dec 2013 17:15:39 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 40453 invoked by uid 99); 6 Dec 2013 17:15:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Dec 2013 17:15:38 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vicky.kak@gmail.com designates 209.85.128.181 as permitted sender) Received: from [209.85.128.181] (HELO mail-ve0-f181.google.com) (209.85.128.181) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Dec 2013 17:15:33 +0000 Received: by mail-ve0-f181.google.com with SMTP id oy12so1087636veb.26 for ; Fri, 06 Dec 2013 09:15:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Gi6qIOE+NlOzTYVNKoH2tIdLD4BL9+TL390aegKqIqw=; b=kS6XuSCwDLb3ONV1CR3K/X/w22GEuPworS+DkDpxEEUE9JpWUczTFr9EyCAZ34/TK+ CIFh4SYJbpMcClvA7h7PBzlmaprq+vFanCdnNFeV3j4PN6F1IkWQMapUF/28jmp18Wjs 2fmT3mCN4d53aiJBFoUOsVs+Dwmy8LdnXKusERBNjHT7M9sipm6zVYQFk1qBbSVwMMng ntkuPWMF7gqaXiYXDYbsXq4VptOupOKNvPmZw46UvC51c3SpHJ62FQQqRM04mmVbOw9B Qw0/vMpu1somA6wHprlqHZmDClK2QUmkpFIRCWQUa5c0OkduyKbQcUtP2ZS4fuiSS4qU ZmZg== MIME-Version: 1.0 X-Received: by 10.220.97.69 with SMTP id k5mr1942984vcn.45.1386350112353; Fri, 06 Dec 2013 09:15:12 -0800 (PST) Received: by 10.220.119.204 with HTTP; Fri, 6 Dec 2013 09:15:12 -0800 (PST) In-Reply-To: References: Date: Fri, 6 Dec 2013 22:45:12 +0530 Message-ID: Subject: Re: Write performance with 1.2.12 From: Vicky Kak To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a11c2dd7efb60a004ece0c9f0 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2dd7efb60a004ece0c9f0 Content-Type: text/plain; charset=ISO-8859-1 Since how long the server had been up, hours,days,months....? On Fri, Dec 6, 2013 at 10:41 PM, srmore wrote: > Looks like I am spending some time in GC. > > java.lang:type=GarbageCollector,name=ConcurrentMarkSweep > > CollectionTime = 51707; > CollectionCount = 103; > > java.lang:type=GarbageCollector,name=ParNew > > CollectionTime = 466835; > CollectionCount = 21315; > > > On Fri, Dec 6, 2013 at 9:58 AM, Jason Wee wrote: > >> Hi srmore, >> >> Perhaps if you use jconsole and connect to the jvm using jmx. Then uner >> MBeans tab, start inspecting the GC metrics. >> >> /Jason >> >> >> On Fri, Dec 6, 2013 at 11:40 PM, srmore wrote: >> >>> >>> >>> >>> On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak wrote: >>> >>>> Hard to say much without knowing about the cassandra configurations. >>>> >>> >>> The cassandra configuration is >>> -Xms8G >>> -Xmx8G >>> -Xmn800m >>> -XX:+UseParNewGC >>> -XX:+UseConcMarkSweepGC >>> -XX:+CMSParallelRemarkEnabled >>> -XX:SurvivorRatio=4 >>> -XX:MaxTenuringThreshold=2 >>> -XX:CMSInitiatingOccupancyFraction=75 >>> -XX:+UseCMSInitiatingOccupancyOnly >>> >>> >>> >>>> Yes compactions/GC's could skipe the CPU, I had similar behavior with >>>> my setup. >>>> >>> >>> Were you able to get around it ? >>> >>> >>>> >>>> -VK >>>> >>>> >>>> On Fri, Dec 6, 2013 at 7:40 PM, srmore wrote: >>>> >>>>> We have a 3 node cluster running cassandra 1.2.12, they are pretty big >>>>> machines 64G ram with 16 cores, cassandra heap is 8G. >>>>> >>>>> The interesting observation is that, when I send traffic to one node >>>>> its performance is 2x more than when I send traffic to all the nodes. We >>>>> ran 1.0.11 on the same box and we observed a slight dip but not half as >>>>> seen with 1.2.12. In both the cases we were writing with LOCAL_QUORUM. >>>>> Changing CL to ONE make a slight improvement but not much. >>>>> >>>>> The read_Repair_chance is 0.1. We see some compactions running. >>>>> >>>>> following is my iostat -x output, sda is the ssd (for commit log) and >>>>> sdb is the spinner. >>>>> >>>>> avg-cpu: %user %nice %system %iowait %steal %idle >>>>> 66.46 0.00 8.95 0.01 0.00 24.58 >>>>> >>>>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz >>>>> avgqu-sz await svctm %util >>>>> sda 0.00 27.60 0.00 4.40 0.00 256.00 >>>>> 58.18 0.01 2.55 1.32 0.58 >>>>> sda1 0.00 0.00 0.00 0.00 0.00 0.00 >>>>> 0.00 0.00 0.00 0.00 0.00 >>>>> sda2 0.00 27.60 0.00 4.40 0.00 256.00 >>>>> 58.18 0.01 2.55 1.32 0.58 >>>>> sdb 0.00 0.00 0.00 0.00 0.00 0.00 >>>>> 0.00 0.00 0.00 0.00 0.00 >>>>> sdb1 0.00 0.00 0.00 0.00 0.00 0.00 >>>>> 0.00 0.00 0.00 0.00 0.00 >>>>> dm-0 0.00 0.00 0.00 0.00 0.00 0.00 >>>>> 0.00 0.00 0.00 0.00 0.00 >>>>> dm-1 0.00 0.00 0.00 0.60 0.00 4.80 >>>>> 8.00 0.00 5.33 2.67 0.16 >>>>> dm-2 0.00 0.00 0.00 0.00 0.00 0.00 >>>>> 0.00 0.00 0.00 0.00 0.00 >>>>> dm-3 0.00 0.00 0.00 24.80 0.00 198.40 >>>>> 8.00 0.24 9.80 0.13 0.32 >>>>> dm-4 0.00 0.00 0.00 6.60 0.00 52.80 >>>>> 8.00 0.01 1.36 0.55 0.36 >>>>> dm-5 0.00 0.00 0.00 0.00 0.00 0.00 >>>>> 0.00 0.00 0.00 0.00 0.00 >>>>> dm-6 0.00 0.00 0.00 24.80 0.00 198.40 >>>>> 8.00 0.29 11.60 0.13 0.32 >>>>> >>>>> >>>>> >>>>> I can see I am cpu bound here but couldn't figure out exactly what is >>>>> causing it, is this caused by GC or Compaction ? I am thinking it is >>>>> compaction, I see a lot of context switches and interrupts in my vmstat >>>>> output. >>>>> >>>>> I don't see GC activity in the logs but see some compaction activity. >>>>> Has anyone seen this ? or know what can be done to free up the CPU. >>>>> >>>>> Thanks, >>>>> Sandeep >>>>> >>>>> >>>>> >>>> >>> >> > --001a11c2dd7efb60a004ece0c9f0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Since how long the server had been up, hours,days,months..= ..?


= On Fri, Dec 6, 2013 at 10:41 PM, srmore <comomore@gmail.com> wrote:
Looks like I am spending so= me time in GC.

java.lang:type=3DGarbageCollector,name=3DCo= ncurrentMarkSweep

CollectionTime =3D 51707;
CollectionCount =3D 103;
=A0
java.la= ng:type=3DGarbageCollector,name=3DParNew
=A0
=A0CollectionTime =3D 466835;
=A0CollectionCount =3D 21315;


On Fri, Dec 6, 2013 at 9:58 AM, = Jason Wee <peichieh@gmail.com> wrote:
Hi srmore,
<= br>
Perhaps if you use jconsole and connect to the jvm using jmx. Then= uner MBeans tab, start inspecting the GC metrics.

/Jason


On Fri, Dec 6, 2013 at 11:40 PM, srmore = <comomore@gmail.com> wrote:



On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak &l= t;vicky.kak@gmail.= com> wrote:
Hard to say much without knowing about the cassandra configurations.
<= /div>
=A0
The cassandra configuration is= =A0
-Xms8G
-Xmx8G
-Xmn800m
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=3D4
-XX:MaxTenuringThreshold=3D2
-XX:CMSInitiatingOccupancyFraction=3D75
-XX:+UseCMSInitiatingOccupancyOnly

=A0
Yes compactions/GC's could skipe the CPU, I had similar behavior with m= y setup.

Were you a= ble to get around it ?
=A0

-VK


On Fri, Dec 6, 2013 at 7:40 PM, srmore <= span dir=3D"ltr"><comomore@gmail.com> wrote:
We have a 3 node cluster running cassandra 1.2.1= 2, they are pretty big machines 64G ram with 16 cores, cassandra heap is 8G= .

The interesting observation is that, when I send traffic to= one node its performance is 2x more than when I send traffic to all the no= des. We ran 1.0.11 on the same box and we observed a slight dip but not hal= f as seen with 1.2.12. In both the cases we were writing with LOCAL_QUORUM.= Changing CL to ONE make a slight improvement but not much.

The read_Repair_chance is 0.1. We see some compactions running.
following is my iostat -x output, sda is the ssd = (for commit log) and sdb is the spinner.

avg-cpu:=A0 %user=A0=A0 %ni= ce %system %iowait=A0 %steal=A0=A0 %idle
=A0=A0=A0=A0=A0=A0=A0=A0=A0 66.46=A0=A0=A0 0.00=A0=A0=A0 8.95=A0=A0=A0 0.01= =A0=A0=A0 0.00=A0=A0 24.58

Device:=A0=A0=A0=A0=A0=A0=A0=A0 rrqm/s=A0= =A0 wrqm/s=A0=A0 r/s=A0=A0 w/s=A0=A0 rsec/s=A0=A0 wsec/s avgrq-sz avgqu-sz= =A0=A0 await=A0 svctm=A0 %util
sda=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0 0.00=A0=A0=A0 27.60=A0 0.00=A0 4.40=A0=A0=A0=A0 0.00=A0=A0 256.00=A0= =A0=A0 58.18=A0=A0=A0=A0 0.01=A0=A0=A0 2.55=A0=A0 1.32=A0=A0 0.58
sda1=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00= =A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.0= 0=A0=A0=A0 0.00=A0=A0 0.00=A0=A0 0.00
sda2=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0 0.00=A0=A0=A0 27.60=A0 0.00=A0 4.40=A0=A0=A0=A0 0.00=A0=A0 256.00= =A0=A0=A0 58.18=A0=A0=A0=A0 0.01=A0=A0=A0 2.55=A0=A0 1.32=A0=A0 0.58
sdb= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00=A0= 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00= =A0=A0=A0 0.00=A0=A0 0.00=A0=A0 0.00
sdb1=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00= =A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.0= 0=A0=A0=A0 0.00=A0=A0 0.00=A0=A0 0.00
dm-0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0= =A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0 0.00=A0=A0 0.00=A0=A0 0= .00
dm-1=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0= 0.00=A0 0.60=A0=A0=A0=A0 0.00=A0=A0=A0=A0 4.80=A0=A0=A0=A0 8.00=A0=A0=A0= =A0 0.00=A0=A0=A0 5.33=A0=A0 2.67=A0=A0 0.16
dm-2=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00= =A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.0= 0=A0=A0=A0 0.00=A0=A0 0.00=A0=A0 0.00
dm-3=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00 24.80=A0=A0=A0=A0 0.00=A0=A0 198.40= =A0=A0=A0=A0 8.00=A0=A0=A0=A0 0.24=A0=A0=A0 9.80=A0=A0 0.13=A0=A0 0.32
d= m-4=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00=A0= 6.60=A0=A0=A0=A0 0.00=A0=A0=A0 52.80=A0=A0=A0=A0 8.00=A0=A0=A0=A0 0.01=A0= =A0=A0 1.36=A0=A0 0.55=A0=A0 0.36
dm-5=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00= =A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0=A0=A0=A0 0.0= 0=A0=A0=A0 0.00=A0=A0 0.00=A0=A0 0.00
dm-6=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0 0.00=A0=A0=A0=A0 0.00=A0 0.00 24.80=A0=A0=A0=A0 0.00=A0=A0 198.40= =A0=A0=A0=A0 8.00=A0=A0=A0=A0 0.29=A0=A0 11.60=A0=A0 0.13=A0=A0 0.32


I can see I am cpu bound here but couldn't figure out exactl= y what is causing it, is this caused by GC or Compaction ? I am thinking it= is compaction, I see a lot of context switches and interrupts in my vmstat= output.

I don't see GC activity in the logs but see some compact= ion activity. Has anyone seen this ? or know what can be done to free up th= e CPU.

Thanks,
Sandeep







--001a11c2dd7efb60a004ece0c9f0--