From user-return-17332-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Sat Jun 4 22:34:55 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6B66C4CAB for ; Sat, 4 Jun 2011 22:34:55 +0000 (UTC) Received: (qmail 33357 invoked by uid 500); 4 Jun 2011 22:34:52 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 33325 invoked by uid 500); 4 Jun 2011 22:34:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 33317 invoked by uid 99); 4 Jun 2011 22:34:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Jun 2011 22:34:52 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of watcherfr@gmail.com designates 209.85.218.44 as permitted sender) Received: from [209.85.218.44] (HELO mail-yi0-f44.google.com) (209.85.218.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Jun 2011 22:34:46 +0000 Received: by yib18 with SMTP id 18so282716yib.31 for ; Sat, 04 Jun 2011 15:34:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=y3In5aWiiv1IzYfUKBPrm+cSQTcSnmIM3XN+CjnUrYo=; b=VpUez7pydL2PzqltPGAyr9mtsSqTux89kPBYIkvAImHBK1O5rsXf4ZPtfSTQpPHG7L hgvqyRNqnU1FTwKTC1rgVl6PhuY/qunoupWF5URxQ7DTpBFOGcE6a8Z+MN1cr/th17Ig 3htRya5RQyBRA3wnjzwrEamhJ7+gloozEceEg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=CY2CNQn50F2YUQj0fdlIyUJgrTQwOADkpuppRPiRK5uyIZJy4XiwCVh6P7QZ87sULA 3nNeHa7eSzI3Wnrz7ZQOtfxXsSqhIBu+MmkfSmEvK8B1Cb2etJACXBwQCiwyAX2gP9Pe OPcx/k3wTV9K5m+8CHLmv1b8uczY8qwjqngAs= MIME-Version: 1.0 Received: by 10.151.115.6 with SMTP id s6mr3032907ybm.130.1307226865737; Sat, 04 Jun 2011 15:34:25 -0700 (PDT) Received: by 10.151.144.14 with HTTP; Sat, 4 Jun 2011 15:34:25 -0700 (PDT) Date: Sun, 5 Jun 2011 00:34:25 +0200 Message-ID: Subject: Troubleshooting IO performance ? From: Philippe To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001e680f0bf4f91fa304a4ea78f7 --001e680f0bf4f91fa304a4ea78f7 Content-Type: text/plain; charset=ISO-8859-1 Hello, I am evaluating using cassandra and I'm running into some strange IO behavior that I can't explain, I'd like some help/ideas to troubleshoot it. I am running a 1 node cluster with a keyspace consisting of two columns families, one of which has dozens of supercolumns itself containing dozens of columns. All in all, this is a couple gigabytes of data, 12GB on the hard drive. The hardware is pretty good : 16GB memory + RAID-0 SSD drives with LVM and an i5 processor (4 cores). Keyspace: xxxxxxxxxxxxxxxxxxx Read Count: 460754852 Read Latency: 1.108205793092766 ms. Write Count: 30620665 Write Latency: 0.01411020877567486 ms. Pending Tasks: 0 Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx SSTable count: 5 Space used (live): 548700725 Space used (total): 548700725 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 11 Read Count: 2891192 Read Latency: NaN ms. Write Count: 3157547 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 367396 Key cache size: 367396 Key cache hit rate: NaN Row cache capacity: 112683 Row cache size: 112683 Row cache hit rate: NaN Compacted row minimum size: 125 Compacted row maximum size: 924 Compacted row mean size: 172 Column Family: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy SSTable count: 7 Space used (live): 8707538781 Space used (total): 8707538781 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 30 Read Count: 457863660 Read Latency: 2.381 ms. Write Count: 27463118 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 4518387 Key cache size: 4518387 Key cache hit rate: 0.9247881700850826 Row cache capacity: 1349682 Row cache size: 1349682 Row cache hit rate: 0.39400533823415573 Compacted row minimum size: 125 Compacted row maximum size: 6866 Compacted row mean size: 165 My app makes a bunch of requests using a MultigetSuperSliceQuery for a set of keys, typically a couple dozen at most. It also selects a subset of the supercolumns. I am running 8 requests in parallel at most. Two days, I ran a 1.5 hour process that basically read every key. The server had no IOwaits and everything was humming along. However, right at the end of the process, there was a huge spike in IOs. I didn't think much of it. Today, after two days of inactivity, any query I run raises the IOs to 80% utilization of the SSD drives even though I'm running the same query over and over (no cache??) Any ideas on how to troubleshoot this, or better, how to solve this ? thanks Philippe --001e680f0bf4f91fa304a4ea78f7 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hello,
I am evaluating using cassandra and I'm running into some st= range IO behavior that I can't explain, I'd like some help/ideas to= troubleshoot it.

I am running a 1 node cluster wi= th a keyspace consisting of two columns families, one of which has dozens o= f supercolumns itself containing dozens of columns.
All in all, this is a couple gigabytes of data, 12GB on the hard drive= .
The hardware is pretty good : 16GB= memory + RAID-0 SSD drives with LVM and an i5 processor (4 cores).

Keyspace: xxxxxxxxxxxxxxxxxxx
=A0 = =A0 =A0 =A0 Read Count: 460754852
=A0 =A0 =A0 =A0 Read Latency: 1= .108205793092766 ms.
=A0 =A0 =A0 =A0 Write Count: 30620665
<= div>=A0 =A0 =A0 =A0 Write Latency: 0.01411020877567486 ms.
=A0 =A0 =A0 =A0 Pending Tasks: 0
=A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx
=A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 SSTable count: 5
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = Space used (live): 548700725
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Spac= e used (total): 548700725
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Columns Count: 0
= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Data Size: 0
=A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 Memtable Switch Count: 11
=A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 Read Count: 2891192
=A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 Read Latency: NaN ms.
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Write Count: 3157547
=A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 Write Latency: NaN ms.
=A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 Pending Tasks: 0
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = Key cache capacity: 367396
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key ca= che size: 367396
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache hit rate: NaN
=A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache capacity: 112683
=A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 Row cache size: 112683
=A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 Row cache hit rate: NaN
=A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 Compacted row minimum size: 125
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row maximum size: 924
<= div>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row mean size: 172
=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Column Family: yyyyyyyyyyyyy= yyyyyyyyyyyyyyyyyyyyyyyy
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 SSTable = count: 7
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Space used (live): 8707538781
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Space used (total): 8707538781
= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Columns Count: 0
=A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 Memtable Data Size: 0
=A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 Memtable Switch Count: 30
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Count: 457863660
=A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Read Latency: 2.381 ms.
=A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 Write Count: 27463118
=A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 Write Latency: NaN ms.
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 P= ending Tasks: 0
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache capacity: 4518387
= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key cache size: 4518387
=A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 Key cache hit rate: 0.9247881700850826
= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache capacity: 1349682
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache size: 1349682
=A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache hit rate: 0.39400533823415573
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row minimum size: 125=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row maximum size: 6866
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row mean size: 165

My app makes a bunch of requests using a=A0Mul= tigetSuperSliceQuery for a set of keys, typically a couple dozen at most= . It also selects a subset of the supercolumns. I am running 8 requests in = parallel at most.


Two days, I ran a 1.5 hour process tha= t basically read every key. The server had no IOwaits and everything was hu= mming along. However, right at the end of the process, there was a huge spi= ke in IOs. I didn't think much of it.

Today, after two days of inactivity, any query I run rai= ses the IOs to 80% utilization of the SSD drives even though I'm runnin= g the same query over and over (no cache??)

Any ideas on how to troubleshoot this, or better, how to= solve this ?
thanks

Philippe
--001e680f0bf4f91fa304a4ea78f7--