Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2B8D269AF for ; Mon, 20 Jun 2011 19:40:27 +0000 (UTC) Received: (qmail 56189 invoked by uid 500); 20 Jun 2011 19:40:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 56157 invoked by uid 500); 20 Jun 2011 19:40:24 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 56149 invoked by uid 99); 20 Jun 2011 19:40:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Jun 2011 19:40:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of watcherfr@gmail.com designates 209.85.218.44 as permitted sender) Received: from [209.85.218.44] (HELO mail-yi0-f44.google.com) (209.85.218.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Jun 2011 19:40:18 +0000 Received: by yie30 with SMTP id 30so3334470yie.31 for ; Mon, 20 Jun 2011 12:39:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=hyZeh0K0n2iSFS6PSeZQyNW+e1BjgRefqOCU0hIKE9Q=; b=ETLUMrxgPV3P8y3ywTu8FsCqdZLUk6Ca+Hqvh4k/BHnCgXndMKmH+tzlczyj49WZw0 meYFmZMCj+zhkMOlmrGRPQTDaAGUicE3XfuFZ95eREVGANa7KxIS7pTskIZcCwZyXHVK mW8nFzbt/cjMRDYclwDMv9m5Ytbiq5rYz+S4A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=Uux6vIkJNvHY06BrHskwCOQ2WUMYUFGxyJe9fS+l7oDTa3Txlf4SSIpWDVBxQfyhPl uSeMzWhkh3x/EelUeU21/CiKTN9i8eI8312paxGa/i/JxHkzYPMNdljnz/ht++KaM6Ma vxngrIcwy22cIJeDAv8yQqC4HnDoO5Yy4TYQA= MIME-Version: 1.0 Received: by 10.236.152.9 with SMTP id c9mr8973143yhk.38.1308598797398; Mon, 20 Jun 2011 12:39:57 -0700 (PDT) Received: by 10.236.36.103 with HTTP; Mon, 20 Jun 2011 12:39:57 -0700 (PDT) Date: Mon, 20 Jun 2011 21:39:57 +0200 Message-ID: Subject: Read performance vs. vmstat + your experience with read optimizations From: Philippe To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf303f6a5c78f27204a629e658 --20cf303f6a5c78f27204a629e658 Content-Type: text/plain; charset=ISO-8859-1 Hi all, I am having trouble reconciling various metrics regarding reads so I'm hoping someone here can help me understand what's going on. I am running tests on a single node cluster with 16GB of RAM. I'm testing on the following column family: Column Family: PUBLIC_MONTHLY SSTable count: 1 Space used (live): 28468417160 Space used (total): 28468417160 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 2669019991 Read Latency: 0.846 ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 20000 Key cache size: 20000 Key cache hit rate: 0.33393368358762754 Row cache capacity: 50000 Row cache size: 50000 Row cache hit rate: 0.15195090894076155 Compacted row minimum size: 216 Compacted row maximum size: 88148 Compacted row mean size: 483 The keys represent a grid cells (65 million), columns to store monthly increments (total & sum, to produce averages), super columns tag the data source The mean row length is 483 bytes The keycache & rowcache enabled but kept very low just to test going through the disk since I expect very random reads in production. I've done everything I can to optimize reads - Cassandra is setup to use only 4GB because my dataset is 28GB - I've compacted the data to a single file - I'm hitting cassandra with only 1 read request at a time & no writes. The request is a multislice across hundreds or thousands of keys The problem: vmstat shows that Cassandra is doing about 200MB/s of IO and since there are no writes on the system, I know it can only be reading (RAID-0 SSD drives). I know that Cassandra is reading about 1/3 the super columns. To be safe, let's assume Cassandra is deserializing 1/2 the row. I'll just assume for simplicity that the row size is 512 bytes. So it looks to me as if Cassandra is deserializing 200MB/((512bytes)/2)=400MB/(0.5KB) = 800K rows per second. That's 800 keys per millisecond. And yet, my app is being throttled by Cassandra during its MultigetSuperSliceCounterQuery: measuring the time spent in Hector show that I'm getting at most 20-30 rows per ms and sometimes I get My questions: 1) Any idea where the discrepency can come from ? I'd like to believe there is some magic setting that will x10 my read performance... 2) How do you recommend allocating memory ? Should I give the OS cache as much as possible or should I max out Cassandra's cache ? 3) Does anyone have numbers regarding the performance of range queries when compared to multiget queries ? I can probably take SimpleGeo's idea of a Z-order code to map the 2D grid to 1D ranges but I wonder if I will get the x10 performance I'm looking for. PS:Nodetool indicates that the read latency is 0.846ms so that's 1.12 key/ms ?! Let's just leave this aside, the process hasbeen running for 12 hours and maybe the number are very different from what we're seeing here. Thanks PG vmstat (SSD not maxed out in this but it does at other times) 0 0 78184 89252 10764 11254784 0 0 186448 18 8002 2352 7 4 50 39 0 9 78184 88880 10764 11249900 0 0 176602 78 8046 2957 7 3 64 26 0 16 78184 88260 10764 11246824 0 0 195726 0 9090 2718 8 4 52 36 0 14 78184 89376 10764 11242496 0 0 227858 0 9533 2444 7 4 45 44 0 0 78184 88260 10764 11254336 0 0 203374 1 9144 2567 7 4 59 30 0 4 78184 90368 10764 11251856 0 0 235394 0 9732 1827 6 4 52 38 0 23 78184 92352 10756 11238000 0 0 203140 98 9007 2835 7 4 59 29 0 0 78184 91608 10756 11250952 0 0 176348 0 8354 3535 7 3 64 26 1 0 78184 92352 10756 11250228 0 0 163952 0 7475 3243 9 3 57 31 iostat -dmx 2 (filtered) Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 80.00 0.00 4061.50 0.00 94.34 0.00 47.57 80.18 19.49 19.49 0.00 0.16 63.00 sda 78.50 0.00 3934.50 0.00 94.72 0.00 49.31 76.87 19.27 19.27 0.00 0.16 62.80 dm-0 0.00 0.00 8310.50 0.00 192.47 0.00 47.43 169.89 20.15 20.15 0.00 0.08 63.80 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 101.50 0.00 5141.00 0.00 121.16 0.00 48.27 103.29 20.03 20.03 0.00 0.16 80.60 sda 100.00 0.00 5190.50 0.00 121.59 0.00 47.97 100.74 19.24 19.24 0.00 0.15 79.80 dm-0 0.00 0.00 10552.50 0.00 242.85 0.00 47.13 219.09 20.57 20.57 0.00 0.08 81.80 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 67.50 0.00 3692.50 0.00 86.23 0.00 47.83 64.89 17.92 17.92 0.00 0.15 57.00 sda 90.00 0.00 3680.00 0.00 87.22 0.00 48.54 70.86 19.77 19.77 0.00 0.16 57.40 dm-0 0.00 0.00 7364.00 0.00 170.29 0.00 47.36 145.79 20.39 20.39 0.00 0.08 58.20 iming examples from my app numRollupKeys=13312,getdata_ms=617 => 21.57 keys/ms numRollupKeys=6144,getdata_ms=224 => 27.42 numRollupKeys=14080,getdata_ms=793 => 17.75 numRollupKeys=8448,getdata_ms=157 => 53.08 numRollupKeys=6400,getdata_ms=601 => 10.64 numRollupKeys=7680,getdata_ms=550 => 13.96 numRollupKeys=12800,getdata_ms=720 => 17.77 numRollupKeys=6912,getdata_ms=275 => 25.14 --20cf303f6a5c78f27204a629e658 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi all,
I am having trouble reconciling various metrics rega= rding reads so I'm hoping someone here can help me understand what'= s going on.

I am running tests on a single node cl= uster with 16GB of RAM. I'm testing on the following column family:
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Column Family: PUBLIC_MONTHLY
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 SSTable count: 1
=A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 Space used (live): 28468417160
=A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 Space used (total): 28468417160
=A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 Memtable Columns Count: 0
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Data Size: 0
=A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Memtable Switch Count: 0
=A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 Read Count: 2669019991
=A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 Read Latency: 0.846 ms.
=A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 Write Count: 0
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Write Latency: NaN ms.
=A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Pending Tasks: 0
=A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 Key cache capacity: 20000
=A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 Key cache size: 20000
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Key ca= che hit rate: 0.33393368358762754
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache capacity: 50000
= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Row cache size: 50000
=A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 Row cache hit rate: 0.15195090894076155
=A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row minimum size: 216
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row maximum size: 88148
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Compacted row mean size: 483
The keys represent a grid cells (65 million), columns to store monthly in= crements (total & sum, to produce averages), super columns tag the data= source
The mean row length is 483 bytes

The keycache= & rowcache enabled but kept very low just to test going through the di= sk since I expect very random reads in production.

I've done everything I can to optimize reads
<= div>=A0- Cassandra is setup to use only 4GB because my dataset is 28GB
=A0- I've compacted the data to a single file
=A0- I= 9;m hitting cassandra with only 1 read request at a time & no writes. T= he request is a multislice across hundreds or thousands of keys

The problem:
vmstat shows that Cassandra is d= oing about 200MB/s of IO and since there are no writes on the system, I kno= w it can only be reading (RAID-0 SSD drives).
I know that Cassand= ra is reading about 1/3 the super columns. To be safe, let's assume Cas= sandra is deserializing 1/2 the row.
I'll just assume for simplicity that the row size is 512 bytes.

So it looks to me as if Cassandra is deserializing 2= 00MB/((512bytes)/2)=3D400MB/(0.5KB) =3D 800K rows per second.
Tha= t's 800 keys per millisecond.

And yet, my app is being throttled by Cassandra during = its=A0MultigetSuperSliceCounterQuery: measuring the time spen= t in Hector show that I'm getting at most 20-30 rows per ms and sometim= es I get

My questions:
1) Any idea where the discrepen= cy can come from ?
I'd like to believe there is some magic se= tting that will x10 my read performance...

2) How = do you recommend allocating memory ? Should I give the OS cache as much as = possible or should I max out Cassandra's cache ?

3) Does anyone have numbers regarding the performance o= f range queries when compared to multiget queries ? I can probably take Sim= pleGeo's idea of a Z-order code to map the 2D grid to 1D ranges but I w= onder if I will get the x10 performance I'm looking for.

PS:Nodetool indicates that the read latency is 0.846ms = so that's 1.12 key/ms ?! Let's just leave this aside, the process h= asbeen running for 12 hours and maybe the number are very different from wh= at we're seeing here.

Thanks
PG

vmstat (SS= D not maxed out in this but it does at other times)
=A00 =A00 =A0= 78184 =A089252 =A010764 11254784 =A0 =A00 =A0 =A00 186448 =A0 =A018 8002 23= 52 =A07 =A04 50 39
=A00 =A09 =A078184 =A088880 =A010764 11249900 =A0 =A00 =A0 =A00 176602= =A0 =A078 8046 2957 =A07 =A03 64 26
=A00 16 =A078184 =A088260 = =A010764 11246824 =A0 =A00 =A0 =A00 195726 =A0 =A0 0 9090 2718 =A08 =A04 52= 36
=A00 14 =A078184 =A089376 =A010764 11242496 =A0 =A00 =A0 =A00= 227858 =A0 =A0 0 9533 2444 =A07 =A04 45 44
=A00 =A00 =A078184 =A088260 =A010764 11254336 =A0 =A00 =A0 =A00 203374= =A0 =A0 1 9144 2567 =A07 =A04 59 30
=A00 =A04 =A078184 =A090368 = =A010764 11251856 =A0 =A00 =A0 =A00 235394 =A0 =A0 0 9732 1827 =A06 =A04 52= 38
=A00 23 =A078184 =A092352 =A010756 11238000 =A0 =A00 =A0 =A00= 203140 =A0 =A098 9007 2835 =A07 =A04 59 29
=A00 =A00 =A078184 =A091608 =A010756 11250952 =A0 =A00 =A0 =A00 176348= =A0 =A0 0 8354 3535 =A07 =A03 64 26
=A01 =A00 =A078184 =A092352 = =A010756 11250228 =A0 =A00 =A0 =A00 163952 =A0 =A0 0 7475 3243 =A09 =A03 57= 31

iostat -dmx 2 (filtered)
Device: =A0 =A0 =A0 =A0 rrqm/s =A0 wrqm/s =A0 =A0 r/s =A0 =A0 w/s =A0 = =A0rMB/s =A0 =A0wMB/s avgrq-sz avgqu-sz =A0 await r_await w_await =A0svctm = =A0%util
sdb =A0 =A0 =A0 =A0 =A0 =A0 =A080.00 =A0 =A0 0.00 4061.5= 0 =A0 =A00.00 =A0 =A094.34 =A0 =A0 0.00 =A0 =A047.57 =A0 =A080.18 =A0 19.49= =A0 19.49 =A0 =A00.00 =A0 0.16 =A063.00
sda =A0 =A0 =A0 =A0 =A0 =A0 =A078.50 =A0 =A0 0.00 3934.50 =A0 =A00.00 = =A0 =A094.72 =A0 =A0 0.00 =A0 =A049.31 =A0 =A076.87 =A0 19.27 =A0 19.27 =A0= =A00.00 =A0 0.16 =A062.80
dm-0 =A0 =A0 =A0 =A0 =A0 =A0 =A00.00 = =A0 =A0 0.00 8310.50 =A0 =A00.00 =A0 192.47 =A0 =A0 0.00 =A0 =A047.43 =A0 1= 69.89 =A0 20.15 =A0 20.15 =A0 =A00.00 =A0 0.08 =A063.80

Device: =A0 =A0 =A0 =A0 rrqm/s =A0 wrqm/s =A0 =A0 r/s = =A0 =A0 w/s =A0 =A0rMB/s =A0 =A0wMB/s avgrq-sz avgqu-sz =A0 await r_await w= _await =A0svctm =A0%util
sdb =A0 =A0 =A0 =A0 =A0 =A0 101.50 =A0 = =A0 0.00 5141.00 =A0 =A00.00 =A0 121.16 =A0 =A0 0.00 =A0 =A048.27 =A0 103.2= 9 =A0 20.03 =A0 20.03 =A0 =A00.00 =A0 0.16 =A080.60
sda =A0 =A0 =A0 =A0 =A0 =A0 100.00 =A0 =A0 0.00 5190.50 =A0 =A00.00 = =A0 121.59 =A0 =A0 0.00 =A0 =A047.97 =A0 100.74 =A0 19.24 =A0 19.24 =A0 =A0= 0.00 =A0 0.15 =A079.80
dm-0 =A0 =A0 =A0 =A0 =A0 =A0 =A00.00 =A0 = =A0 0.00 10552.50 =A0 =A00.00 =A0 242.85 =A0 =A0 0.00 =A0 =A047.13 =A0 219.= 09 =A0 20.57 =A0 20.57 =A0 =A00.00 =A0 0.08 =A081.80

Device: =A0 =A0 =A0 =A0 rrqm/s =A0 wrqm/s =A0 =A0 r/s = =A0 =A0 w/s =A0 =A0rMB/s =A0 =A0wMB/s avgrq-sz avgqu-sz =A0 await r_await w= _await =A0svctm =A0%util
sdb =A0 =A0 =A0 =A0 =A0 =A0 =A067.50 =A0= =A0 0.00 3692.50 =A0 =A00.00 =A0 =A086.23 =A0 =A0 0.00 =A0 =A047.83 =A0 = =A064.89 =A0 17.92 =A0 17.92 =A0 =A00.00 =A0 0.15 =A057.00
sda =A0 =A0 =A0 =A0 =A0 =A0 =A090.00 =A0 =A0 0.00 3680.00 =A0 =A00.00 = =A0 =A087.22 =A0 =A0 0.00 =A0 =A048.54 =A0 =A070.86 =A0 19.77 =A0 19.77 =A0= =A00.00 =A0 0.16 =A057.40
dm-0 =A0 =A0 =A0 =A0 =A0 =A0 =A00.00 = =A0 =A0 0.00 7364.00 =A0 =A00.00 =A0 170.29 =A0 =A0 0.00 =A0 =A047.36 =A0 1= 45.79 =A0 20.39 =A0 20.39 =A0 =A00.00 =A0 0.08 =A058.20

iming examples from my app
numRollupKeys=3D13= 312,getdata_ms=3D617 =3D> 21.57 keys/ms
numRollupKeys=3D6144,g= etdata_ms=3D224 =A0=3D> 27.42
numRollupKeys=3D14080,getdata_ms= =3D793 =3D> 17.75
numRollupKeys=3D8448,getdata_ms=3D157 =A0=3D> 53.08
numRo= llupKeys=3D6400,getdata_ms=3D601 =A0=3D> 10.64
numRollupKeys= =3D7680,getdata_ms=3D550 =A0=3D> 13.96
numRollupKeys=3D12800,g= etdata_ms=3D720 =3D> 17.77
numRollupKeys=3D6912,getdata_ms=3D275 =A0=3D> 25.14

<= /div>


--20cf303f6a5c78f27204a629e658--