Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D29C54705 for ; Mon, 6 Jun 2011 13:41:38 +0000 (UTC) Received: (qmail 80679 invoked by uid 500); 6 Jun 2011 13:41:36 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 80628 invoked by uid 500); 6 Jun 2011 13:41:36 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 80620 invoked by uid 99); 6 Jun 2011 13:41:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jun 2011 13:41:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of watcherfr@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-yx0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jun 2011 13:41:30 +0000 Received: by yxm34 with SMTP id 34so476210yxm.31 for ; Mon, 06 Jun 2011 06:41:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=/NyfqE55IEwWDYT9DfWsUwFI+T5luRVFluO8xjwN6VU=; b=EoF4Gl0qGr4Jlr4He41XDgTouJSl6BPthmenTp0zMjyk/f21DLo7QGUD8+Zp3ZNopK q2L1A+hHlvNs3SgZTwxIglrdFqVbZEmcvvoEi5Lr9G5NFO76D//qooOyLJSl0wwILY85 xLzjfpA+2NyXGwloTQ6XMhFk3qyp1PZ/5wQx0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=ewEZDYB9BtsgbQGmuRFlrCDEugnEAiGMRNuJHtN7XItszCktOqxtEHULZpnCHk6Acr n+SQcvR6O/yeYDrz1K8GTrdk8zvIG0L/JB8XcLO2MhG1oAn2/9ZBJgP2O100fBnT2K9w cwv7rdG0vx17hM/ACHmdAsBGhUgqJsCpBGZP0= MIME-Version: 1.0 Received: by 10.150.59.15 with SMTP id h15mr4355940yba.73.1307367669173; Mon, 06 Jun 2011 06:41:09 -0700 (PDT) Received: by 10.151.144.14 with HTTP; Mon, 6 Jun 2011 06:41:08 -0700 (PDT) Received: by 10.151.144.14 with HTTP; Mon, 6 Jun 2011 06:41:08 -0700 (PDT) In-Reply-To: References: Date: Mon, 6 Jun 2011 15:41:08 +0200 Message-ID: Subject: Re: Troubleshooting IO performance ? From: Philippe To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0cd6e7c8830e6804a50b4155 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd6e7c8830e6804a50b4155 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable hum..no, it wasn't swapping. cassandra was the only thing running on that server and i was querying the same keys over and over i restarted Cassandra and doing the same thing, io is now down to zero whil= e cpu is up which dosen't surprise me as much. I'll report if it happens again. Le 5 juin 2011 16:55, "Jonathan Ellis" a =E9crit : > You may be swapping. > > http://spyced.blogspot.com/2010/01/linux-performance-basics.html > explains how to check this as well as how to see what threads are busy > in the Java process. > > On Sat, Jun 4, 2011 at 5:34 PM, Philippe wrote: >> Hello, >> I am evaluating using cassandra and I'm running into some strange IO >> behavior that I can't explain, I'd like some help/ideas to troubleshoot it. >> I am running a 1 node cluster with a keyspace consisting of two columns >> families, one of which has dozens of supercolumns itself containing dozens >> of columns. >> All in all, this is a couple gigabytes of data, 12GB on the hard drive. >> The hardware is pretty good : 16GB memory + RAID-0 SSD drives with LVM and >> an i5 processor (4 cores). >> Keyspace: xxxxxxxxxxxxxxxxxxx >> Read Count: 460754852 >> Read Latency: 1.108205793092766 ms. >> Write Count: 30620665 >> Write Latency: 0.01411020877567486 ms. >> Pending Tasks: 0 >> Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx >> SSTable count: 5 >> Space used (live): 548700725 >> Space used (total): 548700725 >> Memtable Columns Count: 0 >> Memtable Data Size: 0 >> Memtable Switch Count: 11 >> Read Count: 2891192 >> Read Latency: NaN ms. >> Write Count: 3157547 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache capacity: 367396 >> Key cache size: 367396 >> Key cache hit rate: NaN >> Row cache capacity: 112683 >> Row cache size: 112683 >> Row cache hit rate: NaN >> Compacted row minimum size: 125 >> Compacted row maximum size: 924 >> Compacted row mean size: 172 >> Column Family: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy >> SSTable count: 7 >> Space used (live): 8707538781 >> Space used (total): 8707538781 >> Memtable Columns Count: 0 >> Memtable Data Size: 0 >> Memtable Switch Count: 30 >> Read Count: 457863660 >> Read Latency: 2.381 ms. >> Write Count: 27463118 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache capacity: 4518387 >> Key cache size: 4518387 >> Key cache hit rate: 0.9247881700850826 >> Row cache capacity: 1349682 >> Row cache size: 1349682 >> Row cache hit rate: 0.39400533823415573 >> Compacted row minimum size: 125 >> Compacted row maximum size: 6866 >> Compacted row mean size: 165 >> My app makes a bunch of requests using a MultigetSuperSliceQuery for a set >> of keys, typically a couple dozen at most. It also selects a subset of the >> supercolumns. I am running 8 requests in parallel at most. >> >> Two days, I ran a 1.5 hour process that basically read every key. The server >> had no IOwaits and everything was humming along. However, right at the end >> of the process, there was a huge spike in IOs. I didn't think much of it= . >> Today, after two days of inactivity, any query I run raises the IOs to 80% >> utilization of the SSD drives even though I'm running the same query ove= r >> and over (no cache??) >> Any ideas on how to troubleshoot this, or better, how to solve this ? >> thanks >> Philippe > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com --000e0cd6e7c8830e6804a50b4155 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

hum..no, it wasn't swapping. cassandra was the only thing running on= that server
and i was querying the same keys over and over

i restarted Cassandra and doing the same thing, io is now down to zero w= hile cpu is up which dosen't surprise me as much.

I'll report if it happens again.

Le 5 juin 2011 16:55, "Jonathan Ellis"= <jbellis@gmail.com> a =E9cr= it=A0:
> You may be swapping.
>
> <= a href=3D"http://spyced.blogspot.com/2010/01/linux-performance-basics.html"= >http://spyced.blogspot.com/2010/01/linux-performance-basics.html
> explains how to check this as well as how to see what threads are busy=
> in the Java process.
>
> On Sat, Jun 4, 2011 at 5:34 = PM, Philippe <watcherfr@gmail.com= > wrote:
>> Hello,
>> I am evaluating using cassandra and I'm run= ning into some strange IO
>> behavior that I can't explain, I&= #39;d like some help/ideas to troubleshoot it.
>> I am running a 1= node cluster with a keyspace consisting of two columns
>> families, one of which has dozens of supercolumns itself containin= g dozens
>> of columns.
>> All in all, this is a couple g= igabytes of data, 12GB on the hard drive.
>> The hardware is prett= y good : 16GB memory + RAID-0 SSD drives with LVM and
>> an i5 processor (4 cores).
>> Keyspace: xxxxxxxxxxxxxxxxx= xx
>> =A0 =A0 =A0 =A0 Read Count: 460754852
>> =A0 =A0 = =A0 =A0 Read Latency: 1.108205793092766 ms.
>> =A0 =A0 =A0 =A0 Wri= te Count: 30620665
>> =A0 =A0 =A0 =A0 Write Latency: 0.01411020877567486 ms.
>>= =A0 =A0 =A0 =A0 Pending Tasks: 0
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx
>> =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 SSTable count: 5
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 Space used (live): 548700725
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Space used (total): 548700725
&= gt;> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Columns Count: 0
>&g= t; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Data Size: 0
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Switch Count: 11
>> =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 Read Count: 2891192
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Latency: NaN ms.
>> = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Write Count: 3157547
>> =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 Write Latency: NaN ms.
>> =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 Pending Tasks: 0
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 Key cache capacity: 367396
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache size: 367396
>>= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache hit rate: NaN
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache capacity: 112683
>> =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 Row cache size: 112683
>> =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 Row cache hit rate: NaN
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row minimum size: 125>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row maximum size: 924>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row mean size: 172
= >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Column Family: yyyyyyyyyyyyyyyyyyy= yyyyyyyyyyyyyyyyyy
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 SSTable count: 7
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Space used (live): 8707538781
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Space used (total): 8707538781
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Columns Count: 0
>> =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 Memtable Data Size: 0
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Switch Count: 30
>&= gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Count: 457863660
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Latency: 2.381 ms.
>> =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 Write Count: 27463118
>> =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 Write Latency: NaN ms.
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Pending Tasks: 0
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache capacity: 4518387
>> =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 Key cache size: 4518387
>> =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 Key cache hit rate: 0.9247881700850826
>> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache capacity: 1349682
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache size: 1349682
>>= ; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache hit rate: 0.39400533823415573>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row minimum size: 125<= br>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row maximum size: 686= 6
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row mean size: 165
&g= t;> My app makes a bunch of requests using a=A0MultigetSuperSliceQuery f= or a set
>> of keys, typically a couple dozen at most. It also sel= ects a subset of the
>> supercolumns. I am running 8 requests in parallel at most.
>= >
>> Two days, I ran a 1.5 hour process that basically read eve= ry key. The server
>> had no IOwaits and everything was humming al= ong. However, right at the end
>> of the process, there was a huge spike in IOs. I didn't think = much of it.
>> Today, after two days of inactivity, any query I ru= n raises the IOs to 80%
>> utilization of the SSD drives even thou= gh I'm running the same query over
>> and over (no cache??)
>> Any ideas on how to troubleshoot= this, or better, how to solve this ?
>> thanks
>> Philip= pe
>
>
>
> --
> Jonathan Ellis
> Pr= oject Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support<= br>> http://www.datastax.com
=
--000e0cd6e7c8830e6804a50b4155--