From user-return-17457-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Tue Jun 7 20:01:17 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 73C986F4B for ; Tue, 7 Jun 2011 20:01:17 +0000 (UTC) Received: (qmail 88513 invoked by uid 500); 7 Jun 2011 20:01:13 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 88493 invoked by uid 500); 7 Jun 2011 20:01:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 88485 invoked by uid 99); 7 Jun 2011 20:01:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Jun 2011 20:01:13 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of watcherfr@gmail.com designates 209.85.161.172 as permitted sender) Received: from [209.85.161.172] (HELO mail-gx0-f172.google.com) (209.85.161.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Jun 2011 20:01:07 +0000 Received: by gxk19 with SMTP id 19so2893785gxk.31 for ; Tue, 07 Jun 2011 13:00:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=H3KhThtKrHnVjsKQzMB+YYYoymlBm6sVyIdDOd+BDXU=; b=VI/RE1Tut6UZ9GrdYQCyhVZhE8Au1TgL76gdDa2qzQ6bY+eEUe/4DyJiZYvJHbaKAF a9J//Bourt93v3S5Xtu09sYeeTyUmhcYl2gwxDx/wUT3ywfS3EYObTdz5m73jt9u8j7V LYLyT5B1w+4ibXQX8JEzKnJuFeAzqfK86wcps= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=ZGYEaCyw1Uh/Sp+OFUZlIrHiAzo3ovpv11nRz62whLpRvDe4NilUAfiaXaHw6Gl9Di BwWrbMpOSHDJMixAXkk9HFoV7+c4IdwLRxcrj2QNupv3fDkK5h/s9RXOucpEqEpou23U ss5CP1ZvCOuRI7MjMCiEEEJu4I3AqH+D2UIRs= MIME-Version: 1.0 Received: by 10.151.79.10 with SMTP id g10mr6434481ybl.301.1307476845744; Tue, 07 Jun 2011 13:00:45 -0700 (PDT) Received: by 10.151.144.14 with HTTP; Tue, 7 Jun 2011 13:00:45 -0700 (PDT) In-Reply-To: References: Date: Tue, 7 Jun 2011 22:00:45 +0200 Message-ID: Subject: Re: Troubleshooting IO performance ? From: Philippe To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0015174beb94f1543f04a524ac2d X-Virus-Checked: Checked by ClamAV on apache.org --0015174beb94f1543f04a524ac2d Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Aaron, - what version are you on ? 0.7.6-2 - what is the concurrent_reads config setting ? > concurrent_reads: 64 concurrent_writes: 64 Givent that I've got 4 cores and SSD drives, I doubled the concurrent write= s recommended. Given that I've RAID-0ed the SSD drive, I figured I could at least double for SSD and double for RAID-0 the recommended version. Wrong assumptions ? BTW, cassandra is running on an XFS filesystem over LVM over RAID-0 - what is nodetool tpstats showing during the slow down ? > The only value that changes is the ReadStage line. Here's values from a sample every second Pool Name Active Pending Completed ReadStage 64 99303 463085056 ReadStage 64 88430 463095929 ReadStage 64 91937 463107782 So basically, I'm flooding the system right ? For example 99303 means there are 99303 key reads pending, possibly from just a couple MultiSlice gets ? > - exactly how much data are you asking for ? how many rows and what sort = of > slice > According to some munin monitoring, the server is cranking out to the client, over the network, 10Mbits/s =3D 1.25 Mbytes/s The same munin monitoring shows me 200Mbytes/s read from the disks. This is what is worrying me... - has their been a lot of deletes or TTL columns used ? > No deletes, only update, don't know if that counts as deletes though... This is going to be a read-heavy, update-heavy cluster. No TTL columns, no counter columns One question : when nodetool cfstats says the average read latency is 5ms, is that counted once the query is being executed or does that include the time spent "pending" ? Thanks Philippe > > Hope that helps. > Aaron > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 7 Jun 2011, at 10:09, Philippe wrote: > > Ok, here it goes again... No swapping at all... > > procs -----------memory---------- ---swap-- -----io---- -system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy i= d > wa > 1 63 32044 88736 37996 7116524 0 0 227156 0 18314 5607 30 = 5 > 11 53 > 1 63 32044 90844 37996 7103904 0 0 233524 202 17418 4977 29 = 4 > 9 58 > 0 42 32044 91304 37996 7123884 0 0 249736 0 16197 5433 19 = 6 > 3 72 > 3 25 32044 89864 37996 7135980 0 0 223140 16 18135 7567 32 = 5 > 11 52 > 1 1 32044 88664 37996 7150728 0 0 229416 128 19168 7554 36 = 4 > 10 51 > 4 0 32044 89464 37996 7149428 0 0 213852 18 21041 8819 45 = 5 > 12 38 > 4 0 32044 90372 37996 7149432 0 0 233086 142 19909 7041 43 = 5 > 10 41 > 7 1 32044 89752 37996 7149520 0 0 206906 0 19350 6875 50 = 4 > 11 35 > > Lots and lots of disk activity > iostat -dmx 2 > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-s= z > avgqu-sz await r_await w_await svctm %util > sda 52.50 0.00 7813.00 0.00 108.01 0.00 28.3= 1 > 117.15 14.89 14.89 0.00 0.11 83.00 > sdb 56.00 0.00 7755.50 0.00 108.51 0.00 28.6= 6 > 118.67 15.18 15.18 0.00 0.11 82.80 > md1 0.00 0.00 0.00 0.00 0.00 0.00 0.0= 0 > 0.00 0.00 0.00 0.00 0.00 0.00 > md5 0.00 0.00 15796.50 0.00 219.21 0.00 28.= 42 > 0.00 0.00 0.00 0.00 0.00 0.00 > dm-0 0.00 0.00 15796.50 0.00 219.21 0.00 28.= 42 > 273.42 17.03 17.03 0.00 0.05 83.40 > dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.0= 0 > 0.00 0.00 0.00 0.00 0.00 0.00 > > More info : > - all the data directory containing the data I'm querying into is 9.7GB > and this is a server with 16GB > - I'm hitting the server with 6 concurrent multigetsuperslicequeries on > multiple keys, some of them can bring back quite a number of data > - I'm reading all the keys for one column, pretty much sequentially > > This is a query in a rollup table that was originally in MySQL and it > doesn't look like the performance to query by key is better. So I'm betti= ng > I'm doing something wrong here... but what ? > > Any ideas ? > Thanks > > 2011/6/6 Philippe > >> hum..no, it wasn't swapping. cassandra was the only thing running on tha= t >> server >> and i was querying the same keys over and over >> >> i restarted Cassandra and doing the same thing, io is now down to zero >> while cpu is up which dosen't surprise me as much. >> >> I'll report if it happens again. >> Le 5 juin 2011 16:55, "Jonathan Ellis" a =E9crit : >> >> > You may be swapping. >> > >> > http://spyced.blogspot.com/2010/01/linux-performance-basics.html >> > explains how to check this as well as how to see what threads are busy >> > in the Java process. >> > >> > On Sat, Jun 4, 2011 at 5:34 PM, Philippe wrote: >> >> Hello, >> >> I am evaluating using cassandra and I'm running into some strange IO >> >> behavior that I can't explain, I'd like some help/ideas to troublesho= ot >> it. >> >> I am running a 1 node cluster with a keyspace consisting of two colum= ns >> >> families, one of which has dozens of supercolumns itself containing >> dozens >> >> of columns. >> >> All in all, this is a couple gigabytes of data, 12GB on the hard driv= e. >> >> The hardware is pretty good : 16GB memory + RAID-0 SSD drives with LV= M >> and >> >> an i5 processor (4 cores). >> >> Keyspace: xxxxxxxxxxxxxxxxxxx >> >> Read Count: 460754852 >> >> Read Latency: 1.108205793092766 ms. >> >> Write Count: 30620665 >> >> Write Latency: 0.01411020877567486 ms. >> >> Pending Tasks: 0 >> >> Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx >> >> SSTable count: 5 >> >> Space used (live): 548700725 >> >> Space used (total): 548700725 >> >> Memtable Columns Count: 0 >> >> Memtable Data Size: 0 >> >> Memtable Switch Count: 11 >> >> Read Count: 2891192 >> >> Read Latency: NaN ms. >> >> Write Count: 3157547 >> >> Write Latency: NaN ms. >> >> Pending Tasks: 0 >> >> Key cache capacity: 367396 >> >> Key cache size: 367396 >> >> Key cache hit rate: NaN >> >> Row cache capacity: 112683 >> >> Row cache size: 112683 >> >> Row cache hit rate: NaN >> >> Compacted row minimum size: 125 >> >> Compacted row maximum size: 924 >> >> Compacted row mean size: 172 >> >> Column Family: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy >> >> SSTable count: 7 >> >> Space used (live): 8707538781 >> >> Space used (total): 8707538781 >> >> Memtable Columns Count: 0 >> >> Memtable Data Size: 0 >> >> Memtable Switch Count: 30 >> >> Read Count: 457863660 >> >> Read Latency: 2.381 ms. >> >> Write Count: 27463118 >> >> Write Latency: NaN ms. >> >> Pending Tasks: 0 >> >> Key cache capacity: 4518387 >> >> Key cache size: 4518387 >> >> Key cache hit rate: 0.9247881700850826 >> >> Row cache capacity: 1349682 >> >> Row cache size: 1349682 >> >> Row cache hit rate: 0.39400533823415573 >> >> Compacted row minimum size: 125 >> >> Compacted row maximum size: 6866 >> >> Compacted row mean size: 165 >> >> My app makes a bunch of requests using a MultigetSuperSliceQuery for = a >> set >> >> of keys, typically a couple dozen at most. It also selects a subset o= f >> the >> >> supercolumns. I am running 8 requests in parallel at most. >> >> >> >> Two days, I ran a 1.5 hour process that basically read every key. The >> server >> >> had no IOwaits and everything was humming along. However, right at th= e >> end >> >> of the process, there was a huge spike in IOs. I didn't think much of >> it. >> >> Today, after two days of inactivity, any query I run raises the IOs t= o >> 80% >> >> utilization of the SSD drives even though I'm running the same query >> over >> >> and over (no cache??) >> >> Any ideas on how to troubleshoot this, or better, how to solve this ? >> >> thanks >> >> Philippe >> > >> > >> > >> > -- >> > Jonathan Ellis >> > Project Chair, Apache Cassandra >> > co-founder of DataStax, the source for professional Cassandra support >> > http://www.datastax.com >> > > > --0015174beb94f1543f04a524ac2d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Aaron,

- what version are you on ?=A0
0.7.6-2
<= br>
- =A0what is the concurrent_reads config setting ?=A0
concurrent_reads: 64 =A0 =A0
concurrent_writes: 64
=A0
Givent that I've got 4 cores and SSD drives, I d= oubled the concurrent writes recommended.
Given that I've RAID-0ed the SSD drive, I figured I could at least= double for SSD and double for RAID-0 the recommended version.
Wr= ong assumptions ?

BTW, cassandra is running on an = XFS filesystem over LVM over RAID-0

- what is nodetool tpstats showing during the slow down ?=A0
The only value that changes is the ReadStage line. = Here's values from a sample every second
Pool Name =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Active =A0 Pending =A0 =A0 =A0Completed
ReadStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A064 =A0 =A0 9= 9303 =A0 =A0 =A0463085056
ReadStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A064 =A0 =A0 88= 430 =A0 =A0 =A0463095929
ReadStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A064 =A0 =A0 91937 =A0 =A0 =A0463107782

So basically, I'm flooding the system right ? For example 99303 = means there are 99303 key reads pending, possibly from just a couple MultiS= lice gets ?
=A0
- exactly how much data are you asking for ? how many rows and = what sort of slice=A0
According to some munin monitoring, the server is c= ranking out to the client, over the network, 10Mbits/s =3D 1.25 Mbytes/s

The same munin monitoring shows me 200Mbytes/s read = from the disks.=A0This is what is worrying me...

- has their been a lot of deletes or TTL columns used ?=A0
No deletes, only update, don't know if that counts as= deletes though...
=A0
This is going to be a read-heavy= , update-heavy cluster.
No TTL columns, no counter columns

One question : when nodetool cfstats says the average r= ead latency is 5ms, is that counted once the query is being executed or doe= s that include the time spent "pending" ?

Thanks
Philippe

Hope that helps.=A0
<= div> Aaron
=A0
-----------------
Aaron Morton
Freelance Cass= andra Developer
@aaronmorton

On 7 Jun 2011, at 10:09, Philippe wrote:

Ok, here it goes again... No swapping at all...

procs -----------memory---------- ---swap-- -----io---- -system-- ---= -cpu----
=A0r =A0b =A0 swpd =A0 free =A0 buff =A0cache =A0 si =A0 so =A0 =A0bi = =A0 =A0bo =A0 in =A0 cs us sy id wa
=A01 63 =A032044 =A088736 =A037996 7116524 =A0 =A00 =A0 =A00 227156 = =A0 =A0 0 18314 5607 30 =A05 11 53
=A01 63 =A032044 =A090844 =A03= 7996 7103904 =A0 =A00 =A0 =A00 233524 =A0 202 17418 4977 29 =A04 =A09 58
=A00 42 =A032044 =A091304 =A037996 7123884 =A0 =A00 =A0 =A00 249736= =A0 =A0 0 16197 5433 19 =A06 =A03 72
=A03 25 =A032044 =A089864 =A037996 7135980 =A0 =A00 =A0 =A00 223140 = =A0 =A016 18135 7567 32 =A05 11 52
=A01 =A01 =A032044 =A088664 = =A037996 7150728 =A0 =A00 =A0 =A00 229416 =A0 128 19168 7554 36 =A04 10 51<= /div>
=A04 =A00 =A032044 =A089464 =A037996 7149428 =A0 =A00 =A0 =A00 21= 3852 =A0 =A018 21041 8819 45 =A05 12 38
=A04 =A00 =A032044 =A090372 =A037996 7149432 =A0 =A00 =A0 =A00 233086 = =A0 142 19909 7041 43 =A05 10 41
=A07 =A01 =A032044 =A089752 =A03= 7996 7149520 =A0 =A00 =A0 =A00 206906 =A0 =A0 0 19350 6875 50 =A04 11 35

Lots and lots of disk activity
iostat -dmx 2
Device: =A0 =A0 =A0 =A0 rrqm/s =A0 = wrqm/s =A0 =A0 r/s =A0 =A0 w/s =A0 =A0rMB/s =A0 =A0wMB/s avgrq-sz avgqu-sz = =A0 await r_await w_await =A0svctm =A0%util
sda =A0 =A0 =A0 =A0 = =A0 =A0 =A052.50 =A0 =A0 0.00 7813.00 =A0 =A00.00 =A0 108.01 =A0 =A0 0.00 = =A0 =A028.31 =A0 117.15 =A0 14.89 =A0 14.89 =A0 =A00.00 =A0 0.11 =A083.00
sdb =A0 =A0 =A0 =A0 =A0 =A0 =A056.00 =A0 =A0 0.00 7755.50 =A0 =A00.00 = =A0 108.51 =A0 =A0 0.00 =A0 =A028.66 =A0 118.67 =A0 15.18 =A0 15.18 =A0 =A0= 0.00 =A0 0.11 =A082.80
md1 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0.00 =A0 = =A0 0.00 =A0 =A00.00 =A0 =A00.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0= =A0 0.00 =A0 =A00.00 =A0 =A00.00 =A0 =A00.00 =A0 0.00 =A0 0.00
md5 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0.00 =A0 =A0 0.00 15796.50 =A0 =A00.00= =A0 219.21 =A0 =A0 0.00 =A0 =A028.42 =A0 =A0 0.00 =A0 =A00.00 =A0 =A00.00 = =A0 =A00.00 =A0 0.00 =A0 0.00
dm-0 =A0 =A0 =A0 =A0 =A0 =A0 =A00.0= 0 =A0 =A0 0.00 15796.50 =A0 =A00.00 =A0 219.21 =A0 =A0 0.00 =A0 =A028.42 = =A0 273.42 =A0 17.03 =A0 17.03 =A0 =A00.00 =A0 0.05 =A083.40
dm-1 =A0 =A0 =A0 =A0 =A0 =A0 =A00.00 =A0 =A0 0.00 =A0 =A00.00 =A0 =A00= .00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A00.00 =A0 =A0= 0.00 =A0 =A00.00 =A0 0.00 =A0 0.00

More info= :=A0
- all the data directory containing the data I'm queryi= ng into is =A09.7GB and this is a server with 16GB=A0
- I'm hitting the server with 6 concurrent multigetsuperslicequeri= es on multiple keys, some of them can bring back quite a number of data
- I'm reading all the keys for one column, pretty much sequentia= lly

This is a query in a rollup table that was originally i= n MySQL and it doesn't look like the performance to query by key is bet= ter. So I'm betting I'm doing something wrong here... but what ?

Any ideas ?
Thanks

2011/6/6 Philippe <watcherfr@gmail.com>

hum..no, it wasn't swapping. cassandra was the only thing running on= that server
and i was querying the same keys over and over

i restarted Cassandra = and doing the same thing, io is now down to zero while cpu is up which dose= n't surprise me as much.

I'll report if it happens again.

Le 5 juin 2011 16:55, "Jonathan Ellis"= <jbellis@gmail.c= om> a =E9crit=A0:

> = You may be swapping.
>
> http://spyced.blogspot.com/2010/01/linux-= performance-basics.html
> explains how to check this as well as how to see what threads are busy=
> in the Java process.
>
> On Sat, Jun 4, 2011 at 5:34 = PM, Philippe <w= atcherfr@gmail.com> wrote:
>> Hello,
>> I am evaluating using cassandra and I'm run= ning into some strange IO
>> behavior that I can't explain, I&= #39;d like some help/ideas to troubleshoot it.
>> I am running a 1= node cluster with a keyspace consisting of two columns
>> families, one of which has dozens of supercolumns itself containin= g dozens
>> of columns.
>> All in all, this is a couple g= igabytes of data, 12GB on the hard drive.
>> The hardware is prett= y good : 16GB memory + RAID-0 SSD drives with LVM and
>> an i5 processor (4 cores).
>> Keyspace: xxxxxxxxxxxxxxxxx= xx
>> =A0 =A0 =A0 =A0 Read Count: 460754852
>> =A0 =A0 = =A0 =A0 Read Latency: 1.108205793092766 ms.
>> =A0 =A0 =A0 =A0 Wri= te Count: 30620665
>> =A0 =A0 =A0 =A0 Write Latency: 0.01411020877567486 ms.
>>= =A0 =A0 =A0 =A0 Pending Tasks: 0
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx
>> =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 SSTable count: 5
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 Space used (live): 548700725
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Space used (total): 548700725
&= gt;> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Columns Count: 0
>&g= t; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Data Size: 0
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Switch Count: 11
>> =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 Read Count: 2891192
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Latency: NaN ms.
>> = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Write Count: 3157547
>> =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 Write Latency: NaN ms.
>> =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 Pending Tasks: 0
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 Key cache capacity: 367396
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache size: 367396
>>= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache hit rate: NaN
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache capacity: 112683
>> =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 Row cache size: 112683
>> =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 Row cache hit rate: NaN
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row minimum size: 125>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row maximum size: 924>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row mean size: 172
= >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Column Family: yyyyyyyyyyyyyyyyyyy= yyyyyyyyyyyyyyyyyy
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 SSTable count: 7
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Space used (live): 8707538781
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Space used (total): 8707538781
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Columns Count: 0
>> =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 Memtable Data Size: 0
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Switch Count: 30
>&= gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Count: 457863660
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Latency: 2.381 ms.
>> =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 Write Count: 27463118
>> =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 Write Latency: NaN ms.
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Pending Tasks: 0
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache capacity: 4518387
>> =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 Key cache size: 4518387
>> =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 Key cache hit rate: 0.9247881700850826
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache capacity: 1349682
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache size: 1349682
>>= ; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache hit rate: 0.39400533823415573>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row minimum size: 125<= br>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row maximum size: 686= 6
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row mean size: 165
&g= t;> My app makes a bunch of requests using a=A0MultigetSuperSliceQuery f= or a set
>> of keys, typically a couple dozen at most. It also sel= ects a subset of the
>> supercolumns. I am running 8 requests in parallel at most.
>= >
>> Two days, I ran a 1.5 hour process that basically read eve= ry key. The server
>> had no IOwaits and everything was humming al= ong. However, right at the end
>> of the process, there was a huge spike in IOs. I didn't think = much of it.
>> Today, after two days of inactivity, any query I ru= n raises the IOs to 80%
>> utilization of the SSD drives even thou= gh I'm running the same query over
>> and over (no cache??)
>> Any ideas on how to troubleshoot= this, or better, how to solve this ?
>> thanks
>> Philip= pe
>
>
>
> --
> Jonathan Ellis
> Pr= oject Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support<= br>> http://www.d= atastax.com



--0015174beb94f1543f04a524ac2d--