Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 60B1790AF for ; Tue, 14 May 2013 18:45:26 +0000 (UTC) Received: (qmail 15968 invoked by uid 500); 14 May 2013 18:45:23 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 15941 invoked by uid 500); 14 May 2013 18:45:23 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 15924 invoked by uid 99); 14 May 2013 18:45:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 May 2013 18:45:23 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a48.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 May 2013 18:45:18 +0000 Received: from homiemail-a48.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a48.g.dreamhost.com (Postfix) with ESMTP id 3795E4F805B for ; Tue, 14 May 2013 11:44:58 -0700 (PDT) Received: from [172.16.1.8] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a48.g.dreamhost.com (Postfix) with ESMTPSA id A392D4F8057 for ; Tue, 14 May 2013 11:44:57 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_AEE22F76-D531-407A-867B-CEC04483396C" Message-Id: <87244285-3F44-4D59-AE79-B16A3136EF33@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: (better info)any way to get the #writes/second, reads per second Date: Wed, 15 May 2013 06:44:55 +1200 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1503) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_AEE22F76-D531-407A-867B-CEC04483396C Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > Any reason why cassandra might be reading a lot from the data = disks(not > the commit log disk) more than usual? On the new node or all nodes ? Maybe cold Key Cache or cold memmapped files due to a change in the data = distribution ? Did it settle down ?=20 Cheers =20 ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 14/05/2013, at 5:06 AM, "Hiller, Dean" wrote: > Ah, okay iostat -x NEEDS a number like "iostat -x 5" works = better(first > one always shows 4% util while second one shows 100%). Iotop seems a = bit > better here. >=20 > So we know that since we added our new node, we are slammed with read = and > no one is running compations according to "clush -g datanodes nodetool > compactionstats" >=20 > Any reason why cassandra might be reading a lot from the data = disks(not > the commit log disk) more than usual? >=20 > Thanks, > Dean >=20 > On 5/13/13 10:46 AM, "Hiller, Dean" wrote: >=20 >> We running a pretty consistent load on our cluster and added a new = node >> to a 6 node cluster Friday(QA worked great, but production not so = much). >> One mistake that was made was starting up the new node, then = disabling >> the firewall :( which allowed nodes to discover it BEFORE the node >> bootstrapped itself. We shutdown the node and booted him up and he >> bootstrapped himself streaming all the data in. >>=20 >> After that though, all the ndoes have really really high load numbers >> now. We are trying to figure out what is going on still. >>=20 >> Is there any way to get the number of reads/second and writes/second >> through JMX or something? The only way I can see of on doing this is >> manually calculating it by timing the read count and dividing by my >> manual stop watches start/stop times(timerange). >>=20 >> Also, while my load is load average: 20.31, 19.10, 19.72 , what does = a >> normal iostat look like? My iostat await time is 13.66 ms which I = think >> is kind of bad, but not that bad to cause a load of 20.31? >>=20 >> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s >> avgrq-sz avgqu-sz await svctm %util >> sda 0.02 0.07 11.70 1.96 1353.67 702.88 >> 150.58 0.19 13.66 3.61 4.93 >> sdb 0.00 0.02 0.11 0.46 20.72 97.54 >> 206.70 0.00 1.33 0.67 0.04 >>=20 >> Thanks, >> Dean >=20 --Apple-Mail=_AEE22F76-D531-407A-867B-CEC04483396C Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii Any reason why cassandra might be reading a = lot from the data disks(not
the commit log disk) more than = usual?
On the new node or all nodes = ?

Maybe cold Key Cache or cold memmapped files = due to a change in the data distribution ?

Did = it settle down = ? 

Cheers
  
<= br>
http://www.thelastpickle.com

On 14/05/2013, at 5:06 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov> = wrote:

Ah, okay iostat -x NEEDS a number like "iostat -x 5" works = better(first
one always shows 4% util while second one shows 100%). =  Iotop seems a bit
better here.

So we know that since we = added our new node, we are slammed with read and
no one is running = compations according to "clush -g datanodes = nodetool
compactionstats"

Any reason why cassandra might be = reading a lot from the data disks(not
the commit log disk) more than = usual?

Thanks,
Dean

On 5/13/13 10:46 AM, "Hiller, Dean" = <Dean.Hiller@nrel.gov> = wrote:

We running a pretty consistent = load on our cluster and added a new node
to a 6 node cluster = Friday(QA worked great, but production not so much).
One mistake that = was made was starting up the new node, then disabling
the firewall :( = which allowed nodes to discover it BEFORE the node
bootstrapped = itself.  We shutdown the node and booted him up and = he
bootstrapped himself streaming all the data in.

After that = though, all the ndoes have really really high load numbers
now. =  We are trying to figure out what is going on still.

Is = there any way to get the number of reads/second and = writes/second
through JMX or something?  The only way I can see = of on doing this is
manually calculating it by timing the read count = and dividing by my
manual stop watches start/stop = times(timerange).

Also, while my load is load average: 20.31, = 19.10, 19.72 , what does a
normal iostat look like?  My iostat = await time is 13.66 ms which I think
is kind of bad, but not that bad = to cause a load of 20.31?

Device: =         rrqm/s =   wrqm/s     r/s =     w/s   rsec/s =   wsec/s
avgrq-sz avgqu-sz   await  svctm =  %util
sda =             &n= bsp; 0.02     0.07   11.70 =    1.96  1353.67   702.88
150.58 =     0.19   13.66   3.61 =   4.93
sdb =             &n= bsp; 0.00     0.02    0.11 =    0.46    20.72 =    97.54
206.70     0.00 =    1.33   0.67 =   0.04

Thanks,
Dean

<= /div>
= --Apple-Mail=_AEE22F76-D531-407A-867B-CEC04483396C--