Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of watcherfr@gmail.com designates
 209.85.213.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=ewEZDYB9BtsgbQGmuRFlrCDEugnEAiGMRNuJHtN7XItszCktOqxtEHULZpnCHk6Acr
         n+SQcvR6O/yeYDrz1K8GTrdk8zvIG0L/JB8XcLO2MhG1oAn2/9ZBJgP2O100fBnT2K9w
         cwv7rdG0vx17hM/ACHmdAsBGhUgqJsCpBGZP0=
MIME-Version: 1.0
In-Reply-To: <BANLkTinsSAM7SJgPj_rLGyO37Ww2YerWZQ@mail.gmail.com>
References: <BANLkTi=NrHRjLs4JvZtpbtTcvqvJO8yeqQ@mail.gmail.com>
	<BANLkTinsSAM7SJgPj_rLGyO37Ww2YerWZQ@mail.gmail.com>
Date: Mon, 6 Jun 2011 15:41:08 +0200
Message-ID: <BANLkTi=GcxRmydS=fpKRf3ZqHMx4EpCZvg@mail.gmail.com>
Subject: Re: Troubleshooting IO performance ?
From: Philippe <watcherfr@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=000e0cd6e7c8830e6804a50b4155

--000e0cd6e7c8830e6804a50b4155
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

hum..no, it wasn't swapping. cassandra was the only thing running on that
server
and i was querying the same keys over and over

i restarted Cassandra and doing the same thing, io is now down to zero whil=
e
cpu is up which dosen't surprise me as much.

I'll report if it happens again.
Le 5 juin 2011 16:55, "Jonathan Ellis" <jbellis@gmail.com> a =E9crit :
> You may be swapping.
>
> http://spyced.blogspot.com/2010/01/linux-performance-basics.html
> explains how to check this as well as how to see what threads are busy
> in the Java process.
>
> On Sat, Jun 4, 2011 at 5:34 PM, Philippe <watcherfr@gmail.com> wrote:
>> Hello,
>> I am evaluating using cassandra and I'm running into some strange IO
>> behavior that I can't explain, I'd like some help/ideas to troubleshoot
it.
>> I am running a 1 node cluster with a keyspace consisting of two columns
>> families, one of which has dozens of supercolumns itself containing
dozens
>> of columns.
>> All in all, this is a couple gigabytes of data, 12GB on the hard drive.
>> The hardware is pretty good : 16GB memory + RAID-0 SSD drives with LVM
and
>> an i5 processor (4 cores).
>> Keyspace: xxxxxxxxxxxxxxxxxxx
>>         Read Count: 460754852
>>         Read Latency: 1.108205793092766 ms.
>>         Write Count: 30620665
>>         Write Latency: 0.01411020877567486 ms.
>>         Pending Tasks: 0
>>                 Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx
>>                 SSTable count: 5
>>                 Space used (live): 548700725
>>                 Space used (total): 548700725
>>                 Memtable Columns Count: 0
>>                 Memtable Data Size: 0
>>                 Memtable Switch Count: 11
>>                 Read Count: 2891192
>>                 Read Latency: NaN ms.
>>                 Write Count: 3157547
>>                 Write Latency: NaN ms.
>>                 Pending Tasks: 0
>>                 Key cache capacity: 367396
>>                 Key cache size: 367396
>>                 Key cache hit rate: NaN
>>                 Row cache capacity: 112683
>>                 Row cache size: 112683
>>                 Row cache hit rate: NaN
>>                 Compacted row minimum size: 125
>>                 Compacted row maximum size: 924
>>                 Compacted row mean size: 172
>>                 Column Family: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
>>                 SSTable count: 7
>>                 Space used (live): 8707538781
>>                 Space used (total): 8707538781
>>                 Memtable Columns Count: 0
>>                 Memtable Data Size: 0
>>                 Memtable Switch Count: 30
>>                 Read Count: 457863660
>>                 Read Latency: 2.381 ms.
>>                 Write Count: 27463118
>>                 Write Latency: NaN ms.
>>                 Pending Tasks: 0
>>                 Key cache capacity: 4518387
>>                 Key cache size: 4518387
>>                 Key cache hit rate: 0.9247881700850826
>>                 Row cache capacity: 1349682
>>                 Row cache size: 1349682
>>                 Row cache hit rate: 0.39400533823415573
>>                 Compacted row minimum size: 125
>>                 Compacted row maximum size: 6866
>>                 Compacted row mean size: 165
>> My app makes a bunch of requests using a MultigetSuperSliceQuery for a
set
>> of keys, typically a couple dozen at most. It also selects a subset of
the
>> supercolumns. I am running 8 requests in parallel at most.
>>
>> Two days, I ran a 1.5 hour process that basically read every key. The
server
>> had no IOwaits and everything was humming along. However, right at the
end
>> of the process, there was a huge spike in IOs. I didn't think much of it=
.
>> Today, after two days of inactivity, any query I run raises the IOs to
80%
>> utilization of the SSD drives even though I'm running the same query ove=
r
>> and over (no cache??)
>> Any ideas on how to troubleshoot this, or better, how to solve this ?
>> thanks
>> Philippe
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

--000e0cd6e7c8830e6804a50b4155
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<p>hum..no, it wasn&#39;t swapping. cassandra was the only thing running on=
 that server<br>
and i was querying the same keys over and over</p>
<p>i restarted Cassandra and doing the same thing, io is now down to zero w=
hile cpu is up which dosen&#39;t surprise me as much.</p>
<p>I&#39;ll report if it happens again.</p>
<div class=3D"gmail_quote">Le 5 juin 2011 16:55, &quot;Jonathan Ellis&quot;=
 &lt;<a href=3D"mailto:jbellis@gmail.com">jbellis@gmail.com</a>&gt; a =E9cr=
it=A0:<br type=3D"attribution">&gt; You may be swapping.<br>&gt; <br>&gt; <=
a href=3D"http://spyced.blogspot.com/2010/01/linux-performance-basics.html"=
>http://spyced.blogspot.com/2010/01/linux-performance-basics.html</a><br>
&gt; explains how to check this as well as how to see what threads are busy=
<br>&gt; in the Java process.<br>&gt; <br>&gt; On Sat, Jun 4, 2011 at 5:34 =
PM, Philippe &lt;<a href=3D"mailto:watcherfr@gmail.com">watcherfr@gmail.com=
</a>&gt; wrote:<br>
&gt;&gt; Hello,<br>&gt;&gt; I am evaluating using cassandra and I&#39;m run=
ning into some strange IO<br>&gt;&gt; behavior that I can&#39;t explain, I&=
#39;d like some help/ideas to troubleshoot it.<br>&gt;&gt; I am running a 1=
 node cluster with a keyspace consisting of two columns<br>
&gt;&gt; families, one of which has dozens of supercolumns itself containin=
g dozens<br>&gt;&gt; of columns.<br>&gt;&gt; All in all, this is a couple g=
igabytes of data, 12GB on the hard drive.<br>&gt;&gt; The hardware is prett=
y good : 16GB memory + RAID-0 SSD drives with LVM and<br>
&gt;&gt; an i5 processor (4 cores).<br>&gt;&gt; Keyspace: xxxxxxxxxxxxxxxxx=
xx<br>&gt;&gt; =A0 =A0 =A0 =A0 Read Count: 460754852<br>&gt;&gt; =A0 =A0 =
=A0 =A0 Read Latency: 1.108205793092766 ms.<br>&gt;&gt; =A0 =A0 =A0 =A0 Wri=
te Count: 30620665<br>
&gt;&gt; =A0 =A0 =A0 =A0 Write Latency: 0.01411020877567486 ms.<br>&gt;&gt;=
 =A0 =A0 =A0 =A0 Pending Tasks: 0<br>&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx<br>&gt;&gt; =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 SSTable count: 5<br>&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 Space used (live): 548700725<br>
&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Space used (total): 548700725<br>&=
gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Columns Count: 0<br>&gt;&g=
t; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Data Size: 0<br>&gt;&gt; =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Switch Count: 11<br>&gt;&gt; =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 Read Count: 2891192<br>
&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Latency: NaN ms.<br>&gt;&gt; =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Write Count: 3157547<br>&gt;&gt; =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 Write Latency: NaN ms.<br>&gt;&gt; =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 Pending Tasks: 0<br>&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 Key cache capacity: 367396<br>
&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache size: 367396<br>&gt;&gt;=
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache hit rate: NaN<br>&gt;&gt; =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache capacity: 112683<br>&gt;&gt; =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 Row cache size: 112683<br>&gt;&gt; =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 Row cache hit rate: NaN<br>
&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row minimum size: 125<br=
>&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row maximum size: 924<b=
r>&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row mean size: 172<br>=
&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Column Family: yyyyyyyyyyyyyyyyyyy=
yyyyyyyyyyyyyyyyyy<br>
&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 SSTable count: 7<br>&gt;&gt; =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 Space used (live): 8707538781<br>&gt;&gt; =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 Space used (total): 8707538781<br>&gt;&gt; =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Columns Count: 0<br>&gt;&gt; =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 Memtable Data Size: 0<br>
&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Switch Count: 30<br>&gt;&=
gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Count: 457863660<br>&gt;&gt; =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Latency: 2.381 ms.<br>&gt;&gt; =A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0 Write Count: 27463118<br>&gt;&gt; =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 Write Latency: NaN ms.<br>
&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Pending Tasks: 0<br>&gt;&gt; =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache capacity: 4518387<br>&gt;&gt; =A0 =A0=
 =A0 =A0 =A0 =A0 =A0 =A0 Key cache size: 4518387<br>&gt;&gt; =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 Key cache hit rate: 0.9247881700850826<br>&gt;&gt; =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache capacity: 1349682<br>
&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache size: 1349682<br>&gt;&gt=
; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache hit rate: 0.39400533823415573<b=
r>&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row minimum size: 125<=
br>&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row maximum size: 686=
6<br>
&gt;&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row mean size: 165<br>&g=
t;&gt; My app makes a bunch of requests using a=A0MultigetSuperSliceQuery f=
or a set<br>&gt;&gt; of keys, typically a couple dozen at most. It also sel=
ects a subset of the<br>
&gt;&gt; supercolumns. I am running 8 requests in parallel at most.<br>&gt;=
&gt;<br>&gt;&gt; Two days, I ran a 1.5 hour process that basically read eve=
ry key. The server<br>&gt;&gt; had no IOwaits and everything was humming al=
ong. However, right at the end<br>
&gt;&gt; of the process, there was a huge spike in IOs. I didn&#39;t think =
much of it.<br>&gt;&gt; Today, after two days of inactivity, any query I ru=
n raises the IOs to 80%<br>&gt;&gt; utilization of the SSD drives even thou=
gh I&#39;m running the same query over<br>
&gt;&gt; and over (no cache??)<br>&gt;&gt; Any ideas on how to troubleshoot=
 this, or better, how to solve this ?<br>&gt;&gt; thanks<br>&gt;&gt; Philip=
pe<br>&gt; <br>&gt; <br>&gt; <br>&gt; -- <br>&gt; Jonathan Ellis<br>&gt; Pr=
oject Chair, Apache Cassandra<br>
&gt; co-founder of DataStax, the source for professional Cassandra support<=
br>&gt; <a href=3D"http://www.datastax.com">http://www.datastax.com</a><br>=
</div>

--000e0cd6e7c8830e6804a50b4155--