Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E156A4B0C for ; Fri, 10 Jun 2011 22:21:57 +0000 (UTC) Received: (qmail 467 invoked by uid 500); 10 Jun 2011 22:21:54 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 445 invoked by uid 500); 10 Jun 2011 22:21:54 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 437 invoked by uid 99); 10 Jun 2011 22:21:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jun 2011 22:21:54 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of watcherfr@gmail.com designates 209.85.213.44 as permitted sender) Received: from [209.85.213.44] (HELO mail-yw0-f44.google.com) (209.85.213.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jun 2011 22:21:50 +0000 Received: by ywp31 with SMTP id 31so1581195ywp.31 for ; Fri, 10 Jun 2011 15:21:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=DZVXOle4b45FRpO43eXEphhvUn31X92BL/8AA0t38lg=; b=NBL+dFMqZbApXljgiAVozculVLaijwowSRMEv1ngal/8Xq9CpbEZSm/Qg7jkij/P+i wonNEa9d7tSVVnIvgvaU5CnntVTCWoOVYeER7VT7dLn9jU4LanJO/Ga8myrF223uurQg EZ3WhEjnVNe9f2mhhQB5s+CqmsTI3zJ8CRCvg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Qb0J81Qbc3OmVekkDx3vZxog19MvK/vFd1CrJk360n3e1Lh12p4B/PRTlPcY+RwwFn Zu5vnLd968f0vLL+BXmk4bMa9co+KFxaBJ3Gwrm2Zou8fmf7poG43DRTupsOaYQn//Ph kIysbT+Mtujz8wwlUK8fQUX0gDOhkdOMr/iQc= MIME-Version: 1.0 Received: by 10.150.176.19 with SMTP id y19mr3697957ybe.358.1307744489417; Fri, 10 Jun 2011 15:21:29 -0700 (PDT) Received: by 10.151.144.14 with HTTP; Fri, 10 Jun 2011 15:21:29 -0700 (PDT) In-Reply-To: <6CB63AE7-E8A1-4FA7-8F35-36268B627317@thelastpickle.com> References: <6CB63AE7-E8A1-4FA7-8F35-36268B627317@thelastpickle.com> Date: Sat, 11 Jun 2011 00:21:29 +0200 Message-ID: Subject: Re: Troubleshooting IO performance ? From: Philippe To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0cdf0f86bfad0504a562fdfb --000e0cdf0f86bfad0504a562fdfb Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable > > > I'd check you are reading the data you expect then wind back the number o= f > requests and rows / columns requested. Get to a stable baseline and then = add > pressure to see when / how things go wrong. > I just loaded 4.8GB of similar data in another keyspace and ran the same process as in my previous tests but on that data. I started with three threads hitting cassandra. No I/O, hardly any CPU (15% on a 4 core server) After an hour or so, I raised it to 6 threads in parallel. Then to 9 thread= s in parallel. I never got any IO, in fact iostat showed me there wasn't any disk reads. I hardly saw the CPU elevate except at the end. The only difference between the two datasets is that the size of the other one is 8.4Gb. So the second one doesn't fit completely in memory.So my woes are related to how well cassandra is fetching the data in the SSTAbles righ= t ? So what are my options ? My rows are very small at the moment (like well < = 4 kBytes). Should I reduce the read buffer ? Should I reduce the number of SS= T tables ? Thanks Philippe > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 8 Jun 2011, at 08:00, Philippe wrote: > > Aaron, > > - what version are you on ? > 0.7.6-2 > > - what is the concurrent_reads config setting ? >> > concurrent_reads: 64 > concurrent_writes: 64 > > Givent that I've got 4 cores and SSD drives, I doubled the concurrent > writes recommended. > Given that I've RAID-0ed the SSD drive, I figured I could at least double > for SSD and double for RAID-0 the recommended version. > Wrong assumptions ? > > BTW, cassandra is running on an XFS filesystem over LVM over RAID-0 > > - what is nodetool tpstats showing during the slow down ? >> > The only value that changes is the ReadStage line. Here's values from a > sample every second > Pool Name Active Pending Completed > ReadStage 64 99303 463085056 > ReadStage 64 88430 463095929 > ReadStage 64 91937 463107782 > > So basically, I'm flooding the system right ? For example 99303 means the= re > are 99303 key reads pending, possibly from just a couple MultiSlice gets = ? > > >> - exactly how much data are you asking for ? how many rows and what sort >> of slice >> > According to some munin monitoring, the server is cranking out to the > client, over the network, 10Mbits/s =3D 1.25 Mbytes/s > > The same munin monitoring shows me 200Mbytes/s read from the disks. This = is > what is worrying me... > > - has their been a lot of deletes or TTL columns used ? >> > No deletes, only update, don't know if that counts as deletes though... > > This is going to be a read-heavy, update-heavy cluster. > No TTL columns, no counter columns > > One question : when nodetool cfstats says the average read latency is 5ms= , > is that counted once the query is being executed or does that include the > time spent "pending" ? > > Thanks > Philippe > >> >> Hope that helps. >> Aaron >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 7 Jun 2011, at 10:09, Philippe wrote: >> >> Ok, here it goes again... No swapping at all... >> >> procs -----------memory---------- ---swap-- -----io---- -system-- >> ----cpu---- >> r b swpd free buff cache si so bi bo in cs us sy = id >> wa >> 1 63 32044 88736 37996 7116524 0 0 227156 0 18314 5607 30 = 5 >> 11 53 >> 1 63 32044 90844 37996 7103904 0 0 233524 202 17418 4977 29 = 4 >> 9 58 >> 0 42 32044 91304 37996 7123884 0 0 249736 0 16197 5433 19 = 6 >> 3 72 >> 3 25 32044 89864 37996 7135980 0 0 223140 16 18135 7567 32 = 5 >> 11 52 >> 1 1 32044 88664 37996 7150728 0 0 229416 128 19168 7554 36 = 4 >> 10 51 >> 4 0 32044 89464 37996 7149428 0 0 213852 18 21041 8819 45 = 5 >> 12 38 >> 4 0 32044 90372 37996 7149432 0 0 233086 142 19909 7041 43 = 5 >> 10 41 >> 7 1 32044 89752 37996 7149520 0 0 206906 0 19350 6875 50 = 4 >> 11 35 >> >> Lots and lots of disk activity >> iostat -dmx 2 >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-= sz >> avgqu-sz await r_await w_await svctm %util >> sda 52.50 0.00 7813.00 0.00 108.01 0.00 28.= 31 >> 117.15 14.89 14.89 0.00 0.11 83.00 >> sdb 56.00 0.00 7755.50 0.00 108.51 0.00 28.= 66 >> 118.67 15.18 15.18 0.00 0.11 82.80 >> md1 0.00 0.00 0.00 0.00 0.00 0.00 0.= 00 >> 0.00 0.00 0.00 0.00 0.00 0.00 >> md5 0.00 0.00 15796.50 0.00 219.21 0.00 >> 28.42 0.00 0.00 0.00 0.00 0.00 0.00 >> dm-0 0.00 0.00 15796.50 0.00 219.21 0.00 >> 28.42 273.42 17.03 17.03 0.00 0.05 83.40 >> dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.= 00 >> 0.00 0.00 0.00 0.00 0.00 0.00 >> >> More info : >> - all the data directory containing the data I'm querying into is 9.7GB >> and this is a server with 16GB >> - I'm hitting the server with 6 concurrent multigetsuperslicequeries on >> multiple keys, some of them can bring back quite a number of data >> - I'm reading all the keys for one column, pretty much sequentially >> >> This is a query in a rollup table that was originally in MySQL and it >> doesn't look like the performance to query by key is better. So I'm bett= ing >> I'm doing something wrong here... but what ? >> >> Any ideas ? >> Thanks >> >> 2011/6/6 Philippe >> >>> hum..no, it wasn't swapping. cassandra was the only thing running on th= at >>> server >>> and i was querying the same keys over and over >>> >>> i restarted Cassandra and doing the same thing, io is now down to zero >>> while cpu is up which dosen't surprise me as much. >>> >>> I'll report if it happens again. >>> Le 5 juin 2011 16:55, "Jonathan Ellis" a =E9crit : >>> >>> > You may be swapping. >>> > >>> > http://spyced.blogspot.com/2010/01/linux-performance-basics.html >>> > explains how to check this as well as how to see what threads are bus= y >>> > in the Java process. >>> > >>> > On Sat, Jun 4, 2011 at 5:34 PM, Philippe wrote: >>> >> Hello, >>> >> I am evaluating using cassandra and I'm running into some strange IO >>> >> behavior that I can't explain, I'd like some help/ideas to >>> troubleshoot it. >>> >> I am running a 1 node cluster with a keyspace consisting of two >>> columns >>> >> families, one of which has dozens of supercolumns itself containing >>> dozens >>> >> of columns. >>> >> All in all, this is a couple gigabytes of data, 12GB on the hard >>> drive. >>> >> The hardware is pretty good : 16GB memory + RAID-0 SSD drives with L= VM >>> and >>> >> an i5 processor (4 cores). >>> >> Keyspace: xxxxxxxxxxxxxxxxxxx >>> >> Read Count: 460754852 >>> >> Read Latency: 1.108205793092766 ms. >>> >> Write Count: 30620665 >>> >> Write Latency: 0.01411020877567486 ms. >>> >> Pending Tasks: 0 >>> >> Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx >>> >> SSTable count: 5 >>> >> Space used (live): 548700725 >>> >> Space used (total): 548700725 >>> >> Memtable Columns Count: 0 >>> >> Memtable Data Size: 0 >>> >> Memtable Switch Count: 11 >>> >> Read Count: 2891192 >>> >> Read Latency: NaN ms. >>> >> Write Count: 3157547 >>> >> Write Latency: NaN ms. >>> >> Pending Tasks: 0 >>> >> Key cache capacity: 367396 >>> >> Key cache size: 367396 >>> >> Key cache hit rate: NaN >>> >> Row cache capacity: 112683 >>> >> Row cache size: 112683 >>> >> Row cache hit rate: NaN >>> >> Compacted row minimum size: 125 >>> >> Compacted row maximum size: 924 >>> >> Compacted row mean size: 172 >>> >> Column Family: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy >>> >> SSTable count: 7 >>> >> Space used (live): 8707538781 >>> >> Space used (total): 8707538781 >>> >> Memtable Columns Count: 0 >>> >> Memtable Data Size: 0 >>> >> Memtable Switch Count: 30 >>> >> Read Count: 457863660 >>> >> Read Latency: 2.381 ms. >>> >> Write Count: 27463118 >>> >> Write Latency: NaN ms. >>> >> Pending Tasks: 0 >>> >> Key cache capacity: 4518387 >>> >> Key cache size: 4518387 >>> >> Key cache hit rate: 0.9247881700850826 >>> >> Row cache capacity: 1349682 >>> >> Row cache size: 1349682 >>> >> Row cache hit rate: 0.39400533823415573 >>> >> Compacted row minimum size: 125 >>> >> Compacted row maximum size: 6866 >>> >> Compacted row mean size: 165 >>> >> My app makes a bunch of requests using a MultigetSuperSliceQuery for= a >>> set >>> >> of keys, typically a couple dozen at most. It also selects a subset = of >>> the >>> >> supercolumns. I am running 8 requests in parallel at most. >>> >> >>> >> Two days, I ran a 1.5 hour process that basically read every key. Th= e >>> server >>> >> had no IOwaits and everything was humming along. However, right at t= he >>> end >>> >> of the process, there was a huge spike in IOs. I didn't think much o= f >>> it. >>> >> Today, after two days of inactivity, any query I run raises the IOs = to >>> 80% >>> >> utilization of the SSD drives even though I'm running the same query >>> over >>> >> and over (no cache??) >>> >> Any ideas on how to troubleshoot this, or better, how to solve this = ? >>> >> thanks >>> >> Philippe >>> > >>> > >>> > >>> > -- >>> > Jonathan Ellis >>> > Project Chair, Apache Cassandra >>> > co-founder of DataStax, the source for professional Cassandra support >>> > http://www.datastax.com >>> >> >> >> > > --000e0cdf0f86bfad0504a562fdfb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

I'd check you are reading the data you exp= ect then wind back the number of requests and rows / columns requested. Get= to a stable baseline and then add pressure to see when / how things go wro= ng.=A0
I just loaded 4.8GB of= similar data in another keyspace and ran the same process as in my previou= s tests but on that data.
I started with three threads hitting ca= ssandra. No I/O, hardly any CPU (15% on a 4 core server)
After an hour or so, I raised it to 6 threads in parallel. Then to 9 t= hreads in parallel.

I never got any IO, in fact io= stat showed me there wasn't any disk reads. I hardly saw the CPU elevat= e except at the end.

The only difference between the two datasets is that th= e size of the other one is 8.4Gb. So the second one doesn't fit complet= ely in memory.So my woes are related to how well cassandra is fetching the = data in the SSTAbles right ?


So what are my options ? My rows a= re very small at the moment (like well < 4 kBytes). Should I reduce the = read buffer ? Should I reduce the number of SST tables ?

Thanks
Philippe


=A0

Hope that helps.=A0

-----------------
Aaron Morton
Freelance Cass= andra Developer
@aaronmorton

On 8 Jun 2011, at 08:= 00, Philippe wrote:

Aaron,

- wha= t version are you on ?=A0
0.7.6-2

- =A0what is the concurrent_reads config setting ?=A0
concurrent_reads: 64 =A0 =A0
concurrent_writes: 64
=A0
Givent that I've got 4 cores and SSD drives, I d= oubled the concurrent writes recommended.
Given that I've RAID-0ed the SSD drive, I figured I could at least= double for SSD and double for RAID-0 the recommended version.
Wr= ong assumptions ?

BTW, cassandra is running on an = XFS filesystem over LVM over RAID-0

- what is nodetool tpstats showing during the slow down ?=A0
The only value that changes is the ReadStage line. = Here's values from a sample every second
Pool Name =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Active =A0 Pending =A0 =A0 =A0Completed
ReadStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A064 =A0 =A0 9= 9303 =A0 =A0 =A0463085056
ReadStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A064 =A0 =A0 88= 430 =A0 =A0 =A0463095929
ReadStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A064 =A0 =A0 91937 =A0 =A0 =A0463107782

So basically, I'm flooding the system right ? For example 99303 = means there are 99303 key reads pending, possibly from just a couple MultiS= lice gets ?
=A0
- exactly how much data are you asking for ? how many rows and w= hat sort of slice=A0
According to some munin monitoring, the server is c= ranking out to the client, over the network, 10Mbits/s =3D 1.25 Mbytes/s

The same munin monitoring shows me 200Mbytes/s read = from the disks.=A0This is what is worrying me...

- has their been a lot of deletes or TTL columns used ?=A0
No deletes, only update, don't know if that counts as= deletes though...
=A0
This is going to be a read-heavy= , update-heavy cluster.
No TTL columns, no counter columns

One question : when nodetool cfstats says the average r= ead latency is 5ms, is that counted once the query is being executed or doe= s that include the time spent "pending" ?

Thanks
Philippe

Hope that helps.=A0
Aaron
=A0
-----------------
Aaron Morton
Freelance Cass= andra Developer
@aaronmorton

On 7 Jun 2011, at 10:09, Philippe wrote:

Ok, here it goes again... No swapping at all...

procs -----------memory---------- ---swap-- -----io---- -system-- ---= -cpu----
=A0r =A0b =A0 swpd =A0 free =A0 buff =A0cache =A0 si =A0 so =A0 =A0bi = =A0 =A0bo =A0 in =A0 cs us sy id wa
=A01 63 =A032044 =A088736 =A037996 7116524 =A0 =A00 =A0 =A00 227156 = =A0 =A0 0 18314 5607 30 =A05 11 53
=A01 63 =A032044 =A090844 =A03= 7996 7103904 =A0 =A00 =A0 =A00 233524 =A0 202 17418 4977 29 =A04 =A09 58
=A00 42 =A032044 =A091304 =A037996 7123884 =A0 =A00 =A0 =A00 249736= =A0 =A0 0 16197 5433 19 =A06 =A03 72
=A03 25 =A032044 =A089864 =A037996 7135980 =A0 =A00 =A0 =A00 223140 = =A0 =A016 18135 7567 32 =A05 11 52
=A01 =A01 =A032044 =A088664 = =A037996 7150728 =A0 =A00 =A0 =A00 229416 =A0 128 19168 7554 36 =A04 10 51<= /div>
=A04 =A00 =A032044 =A089464 =A037996 7149428 =A0 =A00 =A0 =A00 21= 3852 =A0 =A018 21041 8819 45 =A05 12 38
=A04 =A00 =A032044 =A090372 =A037996 7149432 =A0 =A00 =A0 =A00 233086 = =A0 142 19909 7041 43 =A05 10 41
=A07 =A01 =A032044 =A089752 =A03= 7996 7149520 =A0 =A00 =A0 =A00 206906 =A0 =A0 0 19350 6875 50 =A04 11 35

Lots and lots of disk activity
iostat -dmx 2
Device: =A0 =A0 =A0 =A0 rrqm/s =A0 = wrqm/s =A0 =A0 r/s =A0 =A0 w/s =A0 =A0rMB/s =A0 =A0wMB/s avgrq-sz avgqu-sz = =A0 await r_await w_await =A0svctm =A0%util
sda =A0 =A0 =A0 =A0 = =A0 =A0 =A052.50 =A0 =A0 0.00 7813.00 =A0 =A00.00 =A0 108.01 =A0 =A0 0.00 = =A0 =A028.31 =A0 117.15 =A0 14.89 =A0 14.89 =A0 =A00.00 =A0 0.11 =A083.00
sdb =A0 =A0 =A0 =A0 =A0 =A0 =A056.00 =A0 =A0 0.00 7755.50 =A0 =A00.00 = =A0 108.51 =A0 =A0 0.00 =A0 =A028.66 =A0 118.67 =A0 15.18 =A0 15.18 =A0 =A0= 0.00 =A0 0.11 =A082.80
md1 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0.00 =A0 = =A0 0.00 =A0 =A00.00 =A0 =A00.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0= =A0 0.00 =A0 =A00.00 =A0 =A00.00 =A0 =A00.00 =A0 0.00 =A0 0.00
md5 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0.00 =A0 =A0 0.00 15796.50 =A0 =A00.00= =A0 219.21 =A0 =A0 0.00 =A0 =A028.42 =A0 =A0 0.00 =A0 =A00.00 =A0 =A00.00 = =A0 =A00.00 =A0 0.00 =A0 0.00
dm-0 =A0 =A0 =A0 =A0 =A0 =A0 =A00.0= 0 =A0 =A0 0.00 15796.50 =A0 =A00.00 =A0 219.21 =A0 =A0 0.00 =A0 =A028.42 = =A0 273.42 =A0 17.03 =A0 17.03 =A0 =A00.00 =A0 0.05 =A083.40
dm-1 =A0 =A0 =A0 =A0 =A0 =A0 =A00.00 =A0 =A0 0.00 =A0 =A00.00 =A0 =A00= .00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A00.00 =A0 =A0= 0.00 =A0 =A00.00 =A0 0.00 =A0 0.00

More info= :=A0
- all the data directory containing the data I'm queryi= ng into is =A09.7GB and this is a server with 16GB=A0
- I'm hitting the server with 6 concurrent multigetsuperslicequeri= es on multiple keys, some of them can bring back quite a number of data
- I'm reading all the keys for one column, pretty much sequentia= lly

This is a query in a rollup table that was originally i= n MySQL and it doesn't look like the performance to query by key is bet= ter. So I'm betting I'm doing something wrong here... but what ?

Any ideas ?
Thanks

2011/6/6 Philippe <watcherfr@gmail.com>

hum..no, it wasn't swapping. cassandra was the only thing running on= that server
and i was querying the same keys over and over

i restarted Cassandra = and doing the same thing, io is now down to zero while cpu is up which dose= n't surprise me as much.

I'll report if it happens again.

Le 5 juin 2011 16:55, "Jonathan Ellis"= <jbellis@gmail.c= om> a =E9crit=A0:

> = You may be swapping.
>
> http://spyced.blogspot.com/2010/01/linux-= performance-basics.html
> explains how to check this as well as how to see what threads are busy=
> in the Java process.
>
> On Sat, Jun 4, 2011 at 5:34 = PM, Philippe <w= atcherfr@gmail.com> wrote:
>> Hello,
>> I am evaluating using cassandra and I'm run= ning into some strange IO
>> behavior that I can't explain, I&= #39;d like some help/ideas to troubleshoot it.
>> I am running a 1= node cluster with a keyspace consisting of two columns
>> families, one of which has dozens of supercolumns itself containin= g dozens
>> of columns.
>> All in all, this is a couple g= igabytes of data, 12GB on the hard drive.
>> The hardware is prett= y good : 16GB memory + RAID-0 SSD drives with LVM and
>> an i5 processor (4 cores).
>> Keyspace: xxxxxxxxxxxxxxxxx= xx
>> =A0 =A0 =A0 =A0 Read Count: 460754852
>> =A0 =A0 = =A0 =A0 Read Latency: 1.108205793092766 ms.
>> =A0 =A0 =A0 =A0 Wri= te Count: 30620665
>> =A0 =A0 =A0 =A0 Write Latency: 0.01411020877567486 ms.
>>= =A0 =A0 =A0 =A0 Pending Tasks: 0
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx
>> =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 SSTable count: 5
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 Space used (live): 548700725
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Space used (total): 548700725
&= gt;> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Columns Count: 0
>&g= t; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Data Size: 0
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Switch Count: 11
>> =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 Read Count: 2891192
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Latency: NaN ms.
>> = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Write Count: 3157547
>> =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 Write Latency: NaN ms.
>> =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 Pending Tasks: 0
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 Key cache capacity: 367396
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache size: 367396
>>= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache hit rate: NaN
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache capacity: 112683
>> =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 Row cache size: 112683
>> =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 Row cache hit rate: NaN
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row minimum size: 125>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row maximum size: 924>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row mean size: 172
= >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Column Family: yyyyyyyyyyyyyyyyyyy= yyyyyyyyyyyyyyyyyy
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 SSTable count: 7
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Space used (live): 8707538781
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Space used (total): 8707538781
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Columns Count: 0
>> =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 Memtable Data Size: 0
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Switch Count: 30
>&= gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Count: 457863660
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Latency: 2.381 ms.
>> =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 Write Count: 27463118
>> =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 Write Latency: NaN ms.
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Pending Tasks: 0
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache capacity: 4518387
>> =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 Key cache size: 4518387
>> =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 Key cache hit rate: 0.9247881700850826
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache capacity: 1349682
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache size: 1349682
>>= ; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache hit rate: 0.39400533823415573>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row minimum size: 125<= br>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row maximum size: 686= 6
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row mean size: 165
&g= t;> My app makes a bunch of requests using a=A0MultigetSuperSliceQuery f= or a set
>> of keys, typically a couple dozen at most. It also sel= ects a subset of the
>> supercolumns. I am running 8 requests in parallel at most.
>= >
>> Two days, I ran a 1.5 hour process that basically read eve= ry key. The server
>> had no IOwaits and everything was humming al= ong. However, right at the end
>> of the process, there was a huge spike in IOs. I didn't think = much of it.
>> Today, after two days of inactivity, any query I ru= n raises the IOs to 80%
>> utilization of the SSD drives even thou= gh I'm running the same query over
>> and over (no cache??)
>> Any ideas on how to troubleshoot= this, or better, how to solve this ?
>> thanks
>> Philip= pe
>
>
>
> --
> Jonathan Ellis
> Pr= oject Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support<= br>> http://www.d= atastax.com





--000e0cdf0f86bfad0504a562fdfb--