Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B981E6A08 for ; Mon, 13 Jun 2011 12:43:04 +0000 (UTC) Received: (qmail 21583 invoked by uid 500); 13 Jun 2011 12:43:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 21555 invoked by uid 500); 13 Jun 2011 12:43:01 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 21546 invoked by uid 99); 13 Jun 2011 12:43:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jun 2011 12:43:01 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a78.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jun 2011 12:42:55 +0000 Received: from homiemail-a78.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a78.g.dreamhost.com (Postfix) with ESMTP id D482815C062 for ; Mon, 13 Jun 2011 05:42:33 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=bkYldCam9K 3oP4Av4zcue5zoepRfW9GP6+iBVLiG8dZRXlXq8zuamOf8zoL5tDV9y08mKneDhI LEJO+Dxn60RMy9Jm4WtioNqY9kI7HPRd8ny+7YEY2H4dXFRMH9BP6/d7Z91gujg/ 48AsrCA1DwNrSuH0DxFWpOpuixFr3QlB0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=Lhz7Ou3ZOP6iZbNM g3eJZncMzaU=; b=bjHTO7iN+ROh81qyq+5lmm2sF5bygSEIwaD9HSPBx0+9SpOY 4gSlmaFuRnOQEal3XMXaBnPQY1A2LPGOuw+8oOVyKnBmrbgL/7EbOXRErF2VU27X ZReMHHq59SKNwkBJdKzkKVRBis37C+Z8DflccIeOc0LlgsBuYm47cyOAd0M= Received: from [10.0.1.151] (121-73-157-230.cable.telstraclear.net [121.73.157.230]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a78.g.dreamhost.com (Postfix) with ESMTPSA id BC61C15C059 for ; Mon, 13 Jun 2011 05:42:32 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: multipart/alternative; boundary=Apple-Mail-12--1050731698 Subject: Re: Troubleshooting IO performance ? Date: Tue, 14 Jun 2011 00:42:29 +1200 In-Reply-To: To: user@cassandra.apache.org References: <6CB63AE7-E8A1-4FA7-8F35-36268B627317@thelastpickle.com> Message-Id: X-Mailer: Apple Mail (2.1084) --Apple-Mail-12--1050731698 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii To reduce the number of SSTables increase the memtable_threshold for the = CF.=20 The IO numbers may be because of compaction kicking in. The = CompactionManager provides information via JMX on it's progress, or you = can check the logs. You could increase the min_compaction_threshold for = the CF or disable compaction if you want to during the bulk load. In = your case it's probably a bad idea as in your case every write requires = a read and compaction will help the read performance.=20 These numbers show that about 37GB of data was written to disk, but = compaction has shrunk this down to about 7GB. If you are not doing = deletes that you are doing a log of overwrites > Space used (live): 8283980539 > Space used (total): 39972827945 I'm getting a bit confused about what the problem is. Is it increasing = latency for read requests or the unexplained IO ?=20 =20 ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 12 Jun 2011, at 08:55, Philippe wrote: > More info below > =20 > I just loaded 4.8GB of similar data in another keyspace and ran the = same process as in my previous tests but on that data. > I started with three threads hitting cassandra. No I/O, hardly any CPU = (15% on a 4 core server) > After an hour or so, I raised it to 6 threads in parallel. Then to 9 = threads in parallel. >=20 > I never got any IO, in fact iostat showed me there wasn't any disk = reads. I hardly saw the CPU elevate except at the end. >=20 > The only difference between the two datasets is that the size of the = other one is 8.4Gb. So the second one doesn't fit completely in = memory.So my woes are related to how well cassandra is fetching the data = in the SSTAbles right ? >=20 >=20 > So what are my options ? My rows are very small at the moment (like = well < 4 kBytes). Should I reduce the read buffer ? Should I reduce the = number of SST tables ? > I'm reloading the data from scratch using the incremental update I = will be using in production. I'm hitting the cluster with 6 threads. Yes = I should have decreased it but I was too optimistic and I dont want to = stop it now. The data I'm loading is used for computing running averages = (sum & total) and so every update requires a read.=20 >=20 > As soon as the data no longer fits in memory, I'm seeing huge amounts = of io (almost 380MB/s reading) that I'd like to understand. > My hypothesis is that because I am updating the same key so many times = (dozens if not hundreds of times in some cases), the row is split across = the SSTables and every read needs to go through all the SST tables. > Unfortunately, at some point, cassandra compacted the keys from 5 = tables to 3 and the throughput did not increase after that so I'm not = even sure this makes sense. > 1) Is there another explanation ? Can I do something about this ? > 2) why is the readlatency displayed twice in "cfstats" and why does it = differ ? >=20 > Thanks >=20 > vmstat > procs -----------memory---------- ---swap-- -----io---- -system-- = ----cpu---- > r b swpd free buff cache si so bi bo in cs us = sy id wa > 0 7 47724 94604 22556 7373252 0 0 391646 2152 10498 10297 = 6 6 26 62 > 0 6 47724 93736 22556 7372400 0 0 396774 0 10881 11177 = 5 6 29 60 > 2 5 47724 92496 22556 7374824 0 0 372406 15 11212 11149 = 8 7 25 59 > 0 5 47724 89520 22568 7378484 0 0 399730 526 10964 11975 = 6 7 24 63 > 0 7 47724 87908 22568 7379444 0 0 396216 0 10405 10880 = 5 7 22 66 >=20 > iostat -dmx 2 > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s = avgrq-sz avgqu-sz await r_await w_await svctm %util > sdb 168.50 0.00 3152.00 0.50 185.49 0.25 = 120.66 54.86 17.39 17.39 4.00 0.31 96.80 > sda 178.50 0.50 3090.50 0.50 184.47 0.19 = 122.35 61.16 19.71 19.71 4.00 0.31 97.20 > md1 0.00 0.00 0.00 0.00 0.00 0.00 = 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > md5 0.00 0.00 6612.50 1.50 372.82 0.44 = 115.58 0.00 0.00 0.00 0.00 0.00 0.00 > dm-0 0.00 0.00 6612.50 0.00 372.82 0.00 = 115.47 123.15 18.58 18.58 0.00 0.15 97.20 >=20 > cfstats > Read Count: 88215069 > Read Latency: 1.821254258759351 ms. > Write Count: 88215059 > Write Latency: 0.013311765885686253 ms. > Pending Tasks: 0 > Column Family: PUBLIC_MONTHLY > SSTable count: 3 > Space used (live): 8283980539 > Space used (total): 39972827945 > Memtable Columns Count: 449201 > Memtable Data Size: 21788245 > Memtable Switch Count: 72 > Read Count: 88215069 > Read Latency: 7.433 ms. > Write Count: 88215069 > Write Latency: 0.016 ms. > Pending Tasks: 0 > Key cache capacity: 2000000 > Key cache size: 2000000 > Key cache hit rate: 2.93152540950838E-4 > Row cache capacity: 50000 > Row cache size: 50000 > Row cache hit rate: 0.20131872518304333 > Compacted row minimum size: 125 > Compacted row maximum size: 17084 > Compacted row mean size: 176 >=20 > tpstats > Pool Name Active Pending Completed > ReadStage 6 7 88200770 > RequestResponseStage 0 0 0 > MutationStage 0 0 88200760 > ReadRepairStage 0 0 0 > GossipStage 0 0 0 > AntiEntropyStage 0 0 0 > MigrationStage 0 0 0 > MemtablePostFlusher 0 0 74 > StreamStage 0 0 0 > FlushWriter 0 0 74 > FILEUTILS-DELETE-POOL 0 0 358 > MiscStage 0 0 0 > FlushSorter 0 0 0 > InternalResponseStage 0 0 0 > HintedHandoff 0 0 0 >=20 >=20 > cfhistograms > Offset SSTables Write Latency Read Latency Row = Size Column Count > 1 27628 0 0 = 0 29101531 > 2 12405 3 9 = 0 6677734 > 3 0 30 130 = 0 2477249 > 4 0 191 451 = 0 391129 > 5 0 414 990 = 0 51983 > 6 0 644 1587 = 0 4041 > 7 0 1131 1957 = 0 710 > 8 0 1328 2291 = 0 220 > 10 0 1663 5515 = 0 131 > 12 0 1349 6133 = 0 24 > 14 0 1557 6461 = 0 5 > 17 0 880 8911 = 0 0 > 20 0 594 6685 = 0 0 > 24 0 1099 5316 = 0 0 > 29 0 2044 2283 = 0 0 > 35 0 2901 474 = 0 0 > 42 0 2737 66 = 0 0 > 50 0 876 10 = 0 0 > 60 0 192 4 = 0 0 > 72 0 361 0 = 0 0 > 86 0 771 0 = 0 0 > 103 0 1486 0 = 0 0 > 124 0 2285 0 = 0 11 > 149 0 2618 0 = 24141265 0 > 179 0 2438 0 = 1791177 0 > 215 0 1909 0 = 5326712 0 > 258 0 1008 0 = 1841927 0 > 310 0 420 0 = 2298870 0 > 372 0 105 0 = 1234895 0 > 446 0 3 0 = 1036341 0 > 535 0 3 0 = 553275 0 > 642 0 2 0 = 273339 0 > 770 0 1 0 = 144826 0 > 924 0 1 0 = 48619 0 > 1109 0 1 0 = 10275 0 > 1331 0 1 0 = 2134 0 > 1597 5 0 = 637 0 > 1916 5 0 = 291 0 > 2299 10 0 = 115 0 > 2759 15 0 = 42 0 > 3311 51 0 = 16 0 > 3973 86 0 = 1 0 > 4768 100 0 = 0 0 > 5722 165 0 = 0 0 > 6866 255 0 = 0 0 > 8239 412 0 = 0 0 > 9887 556 0 = 0 0 > 11864 945 0 = 0 0 > 14237 1358 0 = 6 0 >=20 --Apple-Mail-12--1050731698 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
           =      Space used (live): 8283980539
  =               Space used (total): = 39972827945

I'm = getting a bit confused about what the problem is. Is it increasing = latency for read requests or the unexplained IO = ? 
 
http://www.thelastpickle.com

On 12 Jun 2011, at 08:55, Philippe wrote:

More info = below
 
I just = loaded 4.8GB of similar data in another keyspace and ran the same = process as in my previous tests but on that data.
I started with three threads hitting cassandra. No I/O, hardly any = CPU (15% on a 4 core server)
After an hour or so, I raised it to 6 threads in parallel. Then to = 9 threads in parallel.

I never got any IO, in = fact iostat showed me there wasn't any disk reads. I hardly saw the CPU = elevate except at the end.

The only difference between the two datasets is that = the size of the other one is 8.4Gb. So the second one doesn't fit = completely in memory.So my woes are related to how well cassandra is = fetching the data in the SSTAbles right ?


So what are my options ? My = rows are very small at the moment (like well < 4 kBytes). Should I = reduce the read buffer ? Should I reduce the number of SST tables = ?
I'm reloading the data from scratch using the incremental update I = will be using in production. I'm hitting the cluster with 6 threads. Yes = I should have decreased it but I was too optimistic and I dont want to = stop it now. The data I'm loading is used for computing running = averages (sum & total) and so every update requires a = read. 

As soon as the data no longer fits in memory, I'm = seeing huge amounts of io (almost 380MB/s reading) that I'd like to = understand.
My hypothesis is that because I am updating the = same key so many times (dozens if not hundreds of times in some cases), = the row is split across the SSTables and every read needs to go through = all the SST tables.
Unfortunately, at some point, cassandra compacted the keys from 5 = tables to 3 and the throughput did not increase after that so I'm not = even sure this makes sense.
1) Is there another explanation ? = Can I do something about this ?
2) why is the readlatency displayed twice in "cfstats" and why does = it differ = ?

Thanks

vmstat
procs -----------memory---------- ---swap-- -----io---- = -system-- ----cpu----
 r  b   swpd   free   buff  cache =   si   so    bi    bo   in   cs = us sy id wa
 0  7  47724  94604 =  22556 7373252    0    0 391646  2152 = 10498 10297  6  6 26 62
 0  6  47724 =  93736  22556 7372400    0    0 396774 =     0 10881 11177  5  6 29 60
 2  5  47724  92496 =  22556 7374824    0    0 372406    15 = 11212 11149  8  7 25 59
 0  5  47724 =  89520  22568 7378484    0    0 399730 =   526 10964 11975  6  7 24 63
 0  7 =  47724  87908  22568 7379444    0   =  0 396216     0 10405 10880  5  7 22 66

iostat -dmx 2
Device:   =       rrqm/s   wrqm/s     r/s   =   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz =   await r_await w_await  svctm  %util
sdb =             168.50     0.00 = 3152.00    0.50   185.49     0.25   120.66 =    54.86   17.39   17.39    4.00   = 0.31  96.80
sda             178.50     = 0.50 3090.50    0.50   184.47     0.19   = 122.35    61.16   19.71   19.71    4.00 =   0.31  97.20
md1           =     0.00     0.00    0.00   =  0.00     0.00     0.00     0.00 =     0.00    0.00    0.00    0.00 =   0.00   0.00
md5               0.00   =   0.00 6612.50    1.50   372.82     0.44 =   115.58     0.00    0.00    0.00 =    0.00   0.00   0.00
dm-0     =          0.00     0.00 6612.50   =  0.00   372.82     0.00   115.47   123.15 =   18.58   18.58    0.00   0.15 =  97.20

cfstats
      =   Read Count: 88215069
        Read = Latency: 1.821254258759351 ms.
        = Write Count: 88215059
        Write = Latency: 0.013311765885686253 ms.
        Pending Tasks: 0
  =               Column Family: = PUBLIC_MONTHLY
            =     SSTable count: 3
        =         Space used (live): = 8283980539
              =   Space used (total): 39972827945
                Memtable = Columns Count: 449201
          =       Memtable Data Size: 21788245
  =               Memtable Switch Count: = 72
                = Read Count: 88215069
            =     Read Latency: 7.433 ms.
                Write = Count: 88215069
            =     Write Latency: 0.016 ms.
      =           Pending Tasks: 0
  =               Key cache capacity: = 2000000
              =   Key cache size: 2000000
                Key cache = hit rate: 2.93152540950838E-4
        =         Row cache capacity: 50000
  =               Row cache size: = 50000
                = Row cache hit rate: 0.20131872518304333
                Compacted = row minimum size: 125
          =       Compacted row maximum size: 17084
  =               Compacted row mean = size: 176

tpstats
Pool Name                   =  Active   Pending     =  Completed
ReadStage           =               6       =   7       88200770
RequestResponseStage =              0       =   0             =  0
MutationStage           =           0         0 =       88200760
ReadRepairStage               =     0         0       =        0
GossipStage       =                 0     =     0             =  0
AntiEntropyStage           =        0         0     =          0
MigrationStage   =                  0   =       0             =  0
MemtablePostFlusher             =   0         0         =     74
StreamStage         =               0       =   0             =  0
FlushWriter             =           0         0 =             = 74
FILEUTILS-DELETE-POOL           =   0         0         =    358
MiscStage                 =         0         0   =            0
FlushSorter   =                     0 =         0             =  0
InternalResponseStage         =     0         0       =        0
HintedHandoff     =                 0     =     0              0
=


cfhistograms
Offse= t      SSTables     Write Latency     =  Read Latency          Row Size   =    Column Count
1           =    27628               =   0                 0 =                 0     =      29101531
2              12405   =               3       =           9           =       0           = 6677734
3               =    0                30 =               130     =             0         =   2477249
4             =      0               = 191               451     =             0         =    391129
5                  0 =               414     =           990         =         0             = 51983
6               =    0               644 =              1587     =             0         =      4041
7           =        0             =  1131              1957   =               0       =         710
8                  0 =              1328     =          2291         =         0             =   220
10               =   0              1663   =            5515       =           0           =     131
12             =     0              1349 =              6133     =             0         =        24
14                 0   =            1557       =        6461           =       0               =   5
17               =   0               880   =            8911       =           0           =       0
20           =       0               = 594              6685     =             0         =         0
24                 0   =            1099       =        5316           =       0               =   0
29               =   0              2044   =            2283       =           0           =       0
35           =       0             =  2901               474   =               0       =           0
42                 0   =            2737       =          66           =       0               =   0
50               =   0               876   =              10       =           0           =       0
60           =       0               = 192                 4   =               0       =           0
72                 0   =             361       =           0           =       0               =   0
86               =   0               771   =               0       =           0           =       0
103           =      0             =  1486                 0 =                 0     =             0
124                0   =            2285       =           0           =       0               =  11
149               =  0              2618   =               0       =    24141265               =   0
179               =  0              2438   =               0       =     1791177               =   0
215                0   =            1909       =           0           = 5326712                 = 0
258                0 =              1008     =             0         =   1841927                 = 0
310                0 =               420     =             0         =   2298870                 = 0
372                0   =             105       =           0           = 1234895                 = 0
446                0 =                 3     =             0         =   1036341                 = 0
535                0 =                 3     =             0         =    553275               =   0
642                0   =               2       =           0           =  273339                 = 0
770                0 =                 1     =             0         =    144826               =   0
924               =  0                 1   =               0       =       48619             =     0
1109               0   =               1       =           0           =   10275                 = 0
1331               0 =                 1     =             0         =      2134             =     0
1597             =                     5 =                 0     =           637         =         0
1916                   =               5       =           0           =     291               =   0
2299               =                  10   =               0       =         115           =       0
2759           =                     =  15                 0 =                42     =             0
3311                   =              51       =           0           =      16               =   0
3973               =                  86   =               0       =           1           =       0
4768           =                     = 100                 0   =               0       =           0
5722                   =             165       =           0           =       0               =   0
6866               =                 255   =               0       =           0           =       0
8239           =                     = 412                 0   =               0       =           0
9887                   =             556       =           0           =       0               =   0
11864             =                  945   =               0       =           0           =       0
14237         =                     = 1358                 0   =               6       =           0


= --Apple-Mail-12--1050731698--