cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Hot, large row
Date Fri, 25 Jul 2014 06:53:35 GMT
What are your jvm  settings?

Your read pattern implies that you may fetch lots of data into memory
(reading all skus for a given user), maybe it stressed too much the jvm.

Did you use native paging of Java Driver to avoid loading all columns at a
time?

And the loading all skus for one user, is it a rare scenario or is it your
main use case for this colum family?
 Le 25 juil. 2014 04:11, "Keith Wright" <kwright@nanigans.com> a écrit :

> One last item to add to this thread:  we have consistently experienced
> this behavior where over time performance degrades (previously we were
> unable to bootstrap nodes to due long GC pauses from existing nodes).  I
> believe its due to tombstone build up (as I mentioned previously one of the
> tables mentioned is showing a droppable tombstone ratio of > 30%).   The
> sku table is used to hold SKUs that recent users viewed/purchased.  When we
> write a new SKU we set the TTL to 30 days where the row key is the user id;
> our read case is to fetch ALL skus the user has seen within the TTL.  Since
> the user sees SKUs consistently over time, this can result in a row with
> many columns much of which are likely tombstoned (see CASSANDRA-6654 which
> I filed for this which shows that C* does not handle this case well).
>
> I guess I’m just surprised that others aren’t using C* for similar usage
> cases and thus having the same issue?
>
>  I am hoping to upgrade to 2.0.9 which has improvements to remove
> tombstones.
>
> From: Keith Wright <kwright@nanigans.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Thursday, July 24, 2014 at 4:50 PM
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Cc: Don Jackson <djackson@nanigans.com>
> Subject: Re: Hot, large row
>
> When a node is showing the high CMS issue, io is actually low likely due
> to the fact that none is going on during CMS GC.  On a node not showing the
> issue, iostat shows disk usage around 50% (these are SSD) and load hovers
> around 10 for a dual octo core machine this is fine.
>
> In addition, nodetool compactionstats does not show that we are falling
> behind in compactions.
>
> So not sure what is going on here.  We are running CentOS 6.5 with java
> 1.7.0_51.  It does seem like things are getting worse and I’m considering
> dropping and rebuilding all the tables (as I have the data in Hadoop).
>  This seems to be a repeated problem for us with Cassandra and now that
> Aerospike has an open source version, we are very much considering
> switching.
>
> Thanks again for the help and any insight you might have!
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>
>           23.43   12.40   11.40    6.20    0.00   46.57
>
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
>
> sda               0.00     0.00  720.50    0.00     5.51     0.00    15.66
>     1.05    1.54   0.57  41.35
>
> sdc               0.00     0.00 5930.50 2100.50    55.42     8.21    16.23
>     2.30    0.29   0.06  51.25
>
> sdb               0.00     0.00 6951.50 2052.50    65.82     8.02    16.80
>     4.31    0.48   0.07  59.60
>
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>
>            9.48   14.72    5.60    3.67    0.00   66.52
>
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
>
> sda               0.00     0.00  327.00   19.50     2.55     0.08    15.53
>     0.04    0.13   0.12   4.25
>
> sdc               0.00     0.00 3974.50 1403.50    36.37     5.48    15.94
>     0.99    0.18   0.08  45.10
>
> sdb               0.00     0.00 4357.50 1535.00    40.83     6.00    16.28
>     1.10    0.19   0.08  47.45
>
> From: DuyHai Doan <doanduyhai@gmail.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Thursday, July 24, 2014 at 4:44 PM
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Subject: Re: Hot, large row
>
> For global_user_event_skus_v2
>
> 1. number of SSTables per read is quite huge. Considering you're using
> LCS, it means that LCS cannot keep up with write rate and is left behind.
> AFAIK LCS is using SizeTieredCompaction at L0 to cope with extreme write
> burst. Your high number of SSTables per read is quite fishy here.
>
> 2. Write latency is widespead up to 73.457 millisecs, meaning that your
> node is getting behind for write in some cases. Most of write are still
> below 1 millisec but we don't care. What we care here is the large trail of
> write latency climbing up to 73 millisecs
>
> 3. Same remarks for read latency which is worst because the distribution
> is even "flatter", worst cases going up to 100 ms.
>
> If I were you, I'll check for disk I/O first and maybe CPU usage
>
>
> On Thu, Jul 24, 2014 at 10:32 PM, Keith Wright <kwright@nanigans.com>
> wrote:
>
>> Cfhistograms for the tables I believe are most likely the issue are below
>> on the node that most recently presented the issue.  Any ideas?  Note that
>> these tables are LCS and have droppable tombstone ratios of 27% for
>> global_user_event_skus_v2 and 2.7% for the other.  Table definitions also
>> below, you can see that they do use compound keys.
>>
>> Thanks
>>
>> CREATE TABLE global_user_event_skus_v2 (
>>
>>   user_id timeuuid,
>>
>>   app_id int,
>>
>>   type text,
>>
>>   name text,
>>
>>   sku text,
>>
>>   creationtime timestamp,
>>
>>   PRIMARY KEY (user_id, app_id, type, name, sku)
>>
>> ) WITH
>>
>>   bloom_filter_fp_chance=0.100000 AND
>>
>>   caching='KEYS_ONLY' AND
>>
>>   comment='' AND
>>
>>   dclocal_read_repair_chance=0.000000 AND
>>
>>   gc_grace_seconds=86400 AND
>>
>>   index_interval=128 AND
>>
>>   read_repair_chance=0.100000 AND
>>
>>   replicate_on_write='true' AND
>>
>>   populate_io_cache_on_flush='false' AND
>>
>>   default_time_to_live=2592000 AND
>>
>>   speculative_retry='99.0PERCENTILE' AND
>>
>>   memtable_flush_period_in_ms=0 AND
>>
>>   compaction={'class': 'LeveledCompactionStrategy'} AND
>>
>>   compression={'sstable_compression': 'LZ4Compressor'};
>>
>>
>> users/global_user_event_skus_v2 histograms
>>
>>
>> SSTables per Read
>>
>>  1 sstables: 1398122
>>
>>  2 sstables: 231100
>>
>>  3 sstables: 97334
>>
>>  4 sstables: 56496
>>
>>  5 sstables: 43046
>>
>>  6 sstables: 35471
>>
>>  7 sstables: 26888
>>
>>  8 sstables: 21102
>>
>> 10 sstables: 29522
>>
>> 12 sstables: 18314
>>
>> 14 sstables: 11368
>>
>> 17 sstables: 10283
>>
>> 20 sstables: 5766
>>
>> 24 sstables: 3554
>>
>> 29 sstables: 1681
>>
>> 35 sstables: 521
>>
>> 42 sstables: 102
>>
>> 50 sstables: 1
>>
>>
>> Write Latency (microseconds)
>>
>>      2 us: 532
>>
>>      3 us: 10728
>>
>>      4 us: 49608
>>
>>      5 us: 173718
>>
>>      6 us: 404131
>>
>>      7 us: 697468
>>
>>      8 us: 981048
>>
>>     10 us: 2643680
>>
>>     12 us: 3183727
>>
>>     14 us: 3150881
>>
>>     17 us: 3746333
>>
>>     20 us: 2298481
>>
>>     24 us: 1647604
>>
>>     29 us: 1070733
>>
>>     35 us: 693300
>>
>>     42 us: 442120
>>
>>     50 us: 288618
>>
>>     60 us: 205061
>>
>>     72 us: 141194
>>
>>     86 us: 98781
>>
>>    103 us: 73629
>>
>>    124 us: 54900
>>
>>    149 us: 40518
>>
>>    179 us: 30330
>>
>>    215 us: 23953
>>
>>    258 us: 18875
>>
>>    310 us: 14147
>>
>>    372 us: 10509
>>
>>    446 us: 7984
>>
>>    535 us: 6257
>>
>>    642 us: 4903
>>
>>    770 us: 4086
>>
>>    924 us: 3252
>>
>>   1109 us: 2676
>>
>>   1331 us: 2129
>>
>>   1597 us: 1753
>>
>>   1916 us: 1488
>>
>>   2299 us: 1237
>>
>>   2759 us: 1078
>>
>>   3311 us: 866
>>
>>   3973 us: 704
>>
>>   4768 us: 551
>>
>>   5722 us: 553
>>
>>   6866 us: 774
>>
>>   8239 us: 443
>>
>>   9887 us: 303
>>
>>  11864 us: 222
>>
>>  14237 us: 181
>>
>>  17084 us: 146
>>
>>  20501 us: 147
>>
>>  24601 us: 499
>>
>>  29521 us: 842
>>
>>  35425 us: 548
>>
>>  42510 us: 303
>>
>>  51012 us: 560
>>
>>  61214 us: 842
>>
>>  73457 us: 600
>>
>>  88148 us: 283
>>
>> 105778 us: 173
>>
>> 126934 us: 81
>>
>> 152321 us: 18
>>
>> 182785 us: 2
>>
>>
>> Read Latency (microseconds)
>>
>>      35 us: 2
>>
>>      42 us: 72
>>
>>      50 us: 737
>>
>>      60 us: 3555
>>
>>      72 us: 6926
>>
>>      86 us: 8373
>>
>>     103 us: 15287
>>
>>     124 us: 30054
>>
>>     149 us: 46164
>>
>>     179 us: 67828
>>
>>     215 us: 86741
>>
>>     258 us: 92178
>>
>>     310 us: 82205
>>
>>     372 us: 69641
>>
>>     446 us: 64710
>>
>>     535 us: 58295
>>
>>     642 us: 55247
>>
>>     770 us: 57918
>>
>>     924 us: 56928
>>
>>    1109 us: 75762
>>
>>    1331 us: 122306
>>
>>    1597 us: 141564
>>
>>    1916 us: 117301
>>
>>    2299 us: 102293
>>
>>    2759 us: 98789
>>
>>    3311 us: 86923
>>
>>    3973 us: 75304
>>
>>    4768 us: 64012
>>
>>    5722 us: 54973
>>
>>    6866 us: 47221
>>
>>    8239 us: 41483
>>
>>    9887 us: 35015
>>
>>   11864 us: 30249
>>
>>   14237 us: 23505
>>
>>   17084 us: 18364
>>
>>   20501 us: 12992
>>
>>   24601 us: 8837
>>
>>   29521 us: 6497
>>
>>   35425 us: 4840
>>
>>   42510 us: 3536
>>
>>   51012 us: 2471
>>
>>   61214 us: 2784
>>
>>   73457 us: 3664
>>
>>   88148 us: 3219
>>
>>  105778 us: 1980
>>
>>  126934 us: 1050
>>
>>  152321 us: 519
>>
>>  182785 us: 184
>>
>>  219342 us: 82
>>
>>  263210 us: 39
>>
>>  315852 us: 22
>>
>>  379022 us: 16
>>
>>  454826 us: 0
>>
>>  545791 us: 0
>>
>>  654949 us: 1
>>
>>  785939 us: 1
>>
>>  943127 us: 0
>>
>> 1131752 us: 0
>>
>> 1358102 us: 0
>>
>> 1629722 us: 3
>>
>> 1955666 us: 3
>>
>> 2346799 us: 1
>>
>> 2816159 us: 0
>>
>> 3379391 us: 0
>>
>> 4055269 us: 0
>>
>> 4866323 us: 0
>>
>> 5839588 us: 0
>>
>> 7007506 us: 4
>>
>>
>> Partition Size (bytes)
>>
>>      35 bytes: 31
>>
>>      42 bytes: 0
>>
>>      50 bytes: 0
>>
>>      60 bytes: 0
>>
>>      72 bytes: 0
>>
>>      86 bytes: 0
>>
>>     103 bytes: 761108
>>
>>     124 bytes: 23596164
>>
>>     149 bytes: 22963275
>>
>>     179 bytes: 3517971
>>
>>     215 bytes: 6395335
>>
>>     258 bytes: 6197982
>>
>>     310 bytes: 4662589
>>
>>     372 bytes: 5261934
>>
>>     446 bytes: 3032032
>>
>>     535 bytes: 3775310
>>
>>     642 bytes: 2879935
>>
>>     770 bytes: 3002743
>>
>>     924 bytes: 2279900
>>
>>    1109 bytes: 1976357
>>
>>    1331 bytes: 1643875
>>
>>    1597 bytes: 1422199
>>
>>    1916 bytes: 1142746
>>
>>    2299 bytes: 901584
>>
>>    2759 bytes: 760173
>>
>>    3311 bytes: 596874
>>
>>    3973 bytes: 480502
>>
>>    4768 bytes: 380185
>>
>>    5722 bytes: 303315
>>
>>    6866 bytes: 239013
>>
>>    8239 bytes: 187337
>>
>>    9887 bytes: 147218
>>
>>   11864 bytes: 115684
>>
>>   14237 bytes: 89718
>>
>>   17084 bytes: 69160
>>
>>   20501 bytes: 53070
>>
>>   24601 bytes: 40496
>>
>>   29521 bytes: 30344
>>
>>   35425 bytes: 22656
>>
>>   42510 bytes: 16419
>>
>>   51012 bytes: 11621
>>
>>   61214 bytes: 8020
>>
>>   73457 bytes: 5561
>>
>>   88148 bytes: 3708
>>
>>  105778 bytes: 2434
>>
>>  126934 bytes: 1561
>>
>>  152321 bytes: 1017
>>
>>  182785 bytes: 559
>>
>>  219342 bytes: 349
>>
>>  263210 bytes: 181
>>
>>  315852 bytes: 121
>>
>>  379022 bytes: 66
>>
>>  454826 bytes: 22
>>
>>  545791 bytes: 14
>>
>>  654949 bytes: 10
>>
>>  785939 bytes: 2
>>
>>  943127 bytes: 0
>>
>> 1131752 bytes: 4
>>
>> 1358102 bytes: 2
>>
>> 1629722 bytes: 1
>>
>> 1955666 bytes: 2
>>
>> 2346799 bytes: 0
>>
>> 2816159 bytes: 0
>>
>> 3379391 bytes: 0
>>
>> 4055269 bytes: 0
>>
>> 4866323 bytes: 1
>>
>>
>> Cell Count per Partition
>>
>>     1 cells: 50599458
>>
>>     2 cells: 13975363
>>
>>     3 cells: 7927898
>>
>>     4 cells: 4981142
>>
>>     5 cells: 3081166
>>
>>     6 cells: 3034895
>>
>>     7 cells: 1896720
>>
>>     8 cells: 1508849
>>
>>    10 cells: 2418845
>>
>>    12 cells: 1631289
>>
>>    14 cells: 1179269
>>
>>    17 cells: 1301647
>>
>>    20 cells: 919744
>>
>>    24 cells: 864313
>>
>>    29 cells: 735203
>>
>>    35 cells: 597749
>>
>>    42 cells: 469611
>>
>>    50 cells: 367422
>>
>>    60 cells: 312697
>>
>>    72 cells: 251782
>>
>>    86 cells: 197717
>>
>>   103 cells: 161244
>>
>>   124 cells: 132074
>>
>>   149 cells: 103711
>>
>>   179 cells: 81760
>>
>>   215 cells: 63360
>>
>>   258 cells: 48982
>>
>>   310 cells: 37932
>>
>>   372 cells: 28328
>>
>>   446 cells: 21347
>>
>>   535 cells: 15373
>>
>>   642 cells: 10959
>>
>>   770 cells: 7643
>>
>>   924 cells: 5346
>>
>>  1109 cells: 3613
>>
>>  1331 cells: 2291
>>
>>  1597 cells: 1558
>>
>>  1916 cells: 885
>>
>>  2299 cells: 577
>>
>>  2759 cells: 327
>>
>>  3311 cells: 169
>>
>>  3973 cells: 116
>>
>>  4768 cells: 57
>>
>>  5722 cells: 23
>>
>>  6866 cells: 15
>>
>>  8239 cells: 9
>>
>>  9887 cells: 3
>>
>> 11864 cells: 0
>>
>> 14237 cells: 5
>>
>> 17084 cells: 1
>>
>> 20501 cells: 0
>>
>> 24601 cells: 2
>>
>> 29521 cells: 0
>>
>> 35425 cells: 0
>>
>> 42510 cells: 0
>>
>> 51012 cells: 0
>>
>> 61214 cells: 1
>>
>>
>> CREATE TABLE global_user_events (
>>
>>   user_id timeuuid,
>>
>>   app_id int,
>>
>>   type text,
>>
>>   name text,
>>
>>   extra_param map<text, text>,
>>
>>   paid boolean,
>>
>>   value float,
>>
>>   PRIMARY KEY (user_id, app_id, type, name)
>>
>> ) WITH
>>
>>   bloom_filter_fp_chance=0.100000 AND
>>
>>   caching='KEYS_ONLY' AND
>>
>>   comment='' AND
>>
>>   dclocal_read_repair_chance=0.000000 AND
>>
>>   gc_grace_seconds=86400 AND
>>
>>   index_interval=128 AND
>>
>>   read_repair_chance=0.100000 AND
>>
>>   replicate_on_write='true' AND
>>
>>   populate_io_cache_on_flush='false' AND
>>
>>   default_time_to_live=0 AND
>>
>>   speculative_retry='99.0PERCENTILE' AND
>>
>>   memtable_flush_period_in_ms=0 AND
>>
>>   compaction={'class': 'LeveledCompactionStrategy'} AND
>>
>>   compression={'sstable_compression': 'LZ4Compressor'};
>>
>>
>> users/global_user_events histograms
>>
>>
>> SSTables per Read
>>
>>  1 sstables: 1024446
>>
>>  2 sstables: 632948
>>
>>  3 sstables: 230768
>>
>>  4 sstables: 112428
>>
>>  5 sstables: 82651
>>
>>  6 sstables: 61439
>>
>>  7 sstables: 47518
>>
>>  8 sstables: 38490
>>
>> 10 sstables: 58054
>>
>> 12 sstables: 41128
>>
>> 14 sstables: 29642
>>
>> 17 sstables: 31056
>>
>> 20 sstables: 20474
>>
>> 24 sstables: 17799
>>
>> 29 sstables: 13744
>>
>> 35 sstables: 9485
>>
>> 42 sstables: 5933
>>
>> 50 sstables: 3607
>>
>> 60 sstables: 1618
>>
>> 72 sstables: 263
>>
>> 86 sstables: 10
>>
>>
>> Write Latency (microseconds)
>>
>>      10 us: 10
>>
>>      12 us: 75
>>
>>      14 us: 319
>>
>>      17 us: 3550
>>
>>      20 us: 20617
>>
>>      24 us: 105254
>>
>>      29 us: 311290
>>
>>      35 us: 494404
>>
>>      42 us: 516301
>>
>>      50 us: 534061
>>
>>      60 us: 421438
>>
>>      72 us: 314377
>>
>>      86 us: 375240
>>
>>     103 us: 333402
>>
>>     124 us: 816891
>>
>>     149 us: 667832
>>
>>     179 us: 430659
>>
>>     215 us: 336063
>>
>>     258 us: 380606
>>
>>     310 us: 235592
>>
>>     372 us: 68679
>>
>>     446 us: 14988
>>
>>     535 us: 6169
>>
>>     642 us: 3763
>>
>>     770 us: 2591
>>
>>     924 us: 2270
>>
>>    1109 us: 1985
>>
>>    1331 us: 1744
>>
>>    1597 us: 1564
>>
>>    1916 us: 1489
>>
>>    2299 us: 1326
>>
>>    2759 us: 1114
>>
>>    3311 us: 1024
>>
>>    3973 us: 984
>>
>>    4768 us: 833
>>
>>    5722 us: 962
>>
>>    6866 us: 1076
>>
>>    8239 us: 891
>>
>>    9887 us: 611
>>
>>   11864 us: 489
>>
>>   14237 us: 324
>>
>>   17084 us: 306
>>
>>   20501 us: 222
>>
>>   24601 us: 426
>>
>>   29521 us: 791
>>
>>   35425 us: 820
>>
>>   42510 us: 534
>>
>>   51012 us: 530
>>
>>   61214 us: 946
>>
>>   73457 us: 1122
>>
>>   88148 us: 714
>>
>>  105778 us: 313
>>
>>  126934 us: 127
>>
>>  152321 us: 38
>>
>>  182785 us: 3
>>
>>  219342 us: 2
>>
>>  263210 us: 0
>>
>>  315852 us: 0
>>
>>  379022 us: 0
>>
>>  454826 us: 2
>>
>>  545791 us: 0
>>
>>  654949 us: 3
>>
>>  785939 us: 0
>>
>>  943127 us: 0
>>
>> 1131752 us: 0
>>
>> 1358102 us: 0
>>
>> 1629722 us: 0
>>
>> 1955666 us: 0
>>
>> 2346799 us: 1
>>
>> 2816159 us: 0
>>
>> 3379391 us: 0
>>
>> 4055269 us: 0
>>
>> 4866323 us: 0
>>
>> 5839588 us: 0
>>
>> 7007506 us: 1
>>
>>
>> Read Latency (microseconds)
>>
>>      42 us: 1
>>
>>      50 us: 2
>>
>>      60 us: 2
>>
>>      72 us: 4
>>
>>      86 us: 15
>>
>>     103 us: 23
>>
>>     124 us: 29
>>
>>     149 us: 48
>>
>>     179 us: 67
>>
>>     215 us: 80
>>
>>     258 us: 316
>>
>>     310 us: 1812
>>
>>     372 us: 4999
>>
>>     446 us: 8876
>>
>>     535 us: 14586
>>
>>     642 us: 24417
>>
>>     770 us: 31943
>>
>>     924 us: 37453
>>
>>    1109 us: 77449
>>
>>    1331 us: 139135
>>
>>    1597 us: 177538
>>
>>    1916 us: 169568
>>
>>    2299 us: 177617
>>
>>    2759 us: 202552
>>
>>    3311 us: 198101
>>
>>    3973 us: 179807
>>
>>    4768 us: 159149
>>
>>    5722 us: 142282
>>
>>    6866 us: 126124
>>
>>    8239 us: 111529
>>
>>    9887 us: 97658
>>
>>   11864 us: 84913
>>
>>   14237 us: 70905
>>
>>   17084 us: 54917
>>
>>   20501 us: 40623
>>
>>   24601 us: 29545
>>
>>   29521 us: 21448
>>
>>   35425 us: 15255
>>
>>   42510 us: 11222
>>
>>   51012 us: 8150
>>
>>   61214 us: 7473
>>
>>   73457 us: 10060
>>
>>   88148 us: 10530
>>
>>  105778 us: 7573
>>
>>  126934 us: 4331
>>
>>  152321 us: 2097
>>
>>  182785 us: 836
>>
>>  219342 us: 255
>>
>>  263210 us: 84
>>
>>  315852 us: 28
>>
>>  379022 us: 8
>>
>>  454826 us: 2
>>
>>  545791 us: 0
>>
>>  654949 us: 2
>>
>>  785939 us: 1
>>
>>  943127 us: 0
>>
>> 1131752 us: 0
>>
>> 1358102 us: 0
>>
>> 1629722 us: 2
>>
>> 1955666 us: 6
>>
>> 2346799 us: 18
>>
>> 2816159 us: 0
>>
>> 3379391 us: 0
>>
>> 4055269 us: 0
>>
>> 4866323 us: 0
>>
>> 5839588 us: 0
>>
>> 7007506 us: 34
>>
>>
>> Partition Size (bytes)
>>
>>     35 bytes: 25
>>
>>     42 bytes: 0
>>
>>     50 bytes: 0
>>
>>     60 bytes: 0
>>
>>     72 bytes: 21761
>>
>>     86 bytes: 694466658
>>
>>    103 bytes: 219007195
>>
>>    124 bytes: 7034596
>>
>>    149 bytes: 110269079
>>
>>    179 bytes: 101276811
>>
>>    215 bytes: 38518027
>>
>>    258 bytes: 103242356
>>
>>    310 bytes: 51434914
>>
>>    372 bytes: 38048245
>>
>>    446 bytes: 42558831
>>
>>    535 bytes: 32869176
>>
>>    642 bytes: 17188026
>>
>>    770 bytes: 14750559
>>
>>    924 bytes: 15963256
>>
>>   1109 bytes: 16125324
>>
>>   1331 bytes: 11462272
>>
>>   1597 bytes: 6810452
>>
>>   1916 bytes: 5685739
>>
>>   2299 bytes: 10232324
>>
>>   2759 bytes: 5138522
>>
>>   3311 bytes: 5847998
>>
>>   3973 bytes: 4065752
>>
>>   4768 bytes: 3468263
>>
>>   5722 bytes: 2545431
>>
>>   6866 bytes: 1701913
>>
>>   8239 bytes: 993808
>>
>>   9887 bytes: 495282
>>
>>  11864 bytes: 208644
>>
>>  14237 bytes: 74307
>>
>>  17084 bytes: 22286
>>
>>  20501 bytes: 5474
>>
>>  24601 bytes: 1225
>>
>>  29521 bytes: 210
>>
>>  35425 bytes: 97
>>
>>  42510 bytes: 48
>>
>>  51012 bytes: 21
>>
>>  61214 bytes: 15
>>
>>  73457 bytes: 10
>>
>>  88148 bytes: 3
>>
>> 105778 bytes: 4
>>
>> 126934 bytes: 1
>>
>> 152321 bytes: 2
>>
>> 182785 bytes: 1
>>
>>
>> Cell Count per Partition
>>
>>    1 cells: 914323128
>>
>>    2 cells: 196020828
>>
>>    3 cells: 165502164
>>
>>    4 cells: 71735648
>>
>>    5 cells: 54450466
>>
>>    6 cells: 28995756
>>
>>    7 cells: 9625976
>>
>>    8 cells: 10257788
>>
>>   10 cells: 24004757
>>
>>   12 cells: 13087147
>>
>>   14 cells: 13588347
>>
>>   17 cells: 11976353
>>
>>   20 cells: 6245374
>>
>>   24 cells: 5627821
>>
>>   29 cells: 9480533
>>
>>   35 cells: 6008276
>>
>>   42 cells: 5221357
>>
>>   50 cells: 4355454
>>
>>   60 cells: 3955488
>>
>>   72 cells: 2872578
>>
>>   86 cells: 1914447
>>
>>  103 cells: 1216276
>>
>>  124 cells: 652035
>>
>>  149 cells: 279565
>>
>>  179 cells: 99836
>>
>>  215 cells: 29132
>>
>>  258 cells: 6682
>>
>>  310 cells: 1276
>>
>>  372 cells: 258
>>
>>  446 cells: 96
>>
>>  535 cells: 48
>>
>>  642 cells: 25
>>
>>  770 cells: 9
>>
>>  924 cells: 9
>>
>> 1109 cells: 4
>>
>> 1331 cells: 3
>>
>> 1597 cells: 1
>>
>> 1916 cells: 2
>>
>> From: Jack Krupansky <jack@basetechnology.com>
>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Date: Thursday, July 24, 2014 at 4:07 PM
>>
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Re: Hot, large row
>>
>> Could it be some “fat columns” (cells with large blob or text values)
>> rather than the cell-count per se? IOW, a “big row” rather than a “wide
>> row”?
>>
>> And, could it be a large partition rather than a large row (many rows in
>> a single partition)? Are clustering columns being used in the primary key?
>>
>> -- Jack Krupansky
>>
>> *From:* DuyHai Doan <doanduyhai@gmail.com>
>> *Sent:* Thursday, July 24, 2014 3:53 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Hot, large row
>>
>> Your extract of cfhistograms show that there are no particular "wide
>> rows". The widest has 61214 cells which is big but not that huge to be
>> really a concern.
>>
>> Turning on trace probabilty only tells give you some "hints" about what
>> kind of queries are done, it does not give the exact partition key nor
>> other statement values, especially when you are using prepared statements
>> ...
>>
>>
>> "I am considering reducing read_request_timeout_in_ms: 5000 in
>> cassandra.yaml so that it reduces the impact when this occurs." --> Don't
>> do that, you'll only sweep dust under the carpet. Find the real issue and
>> fix it instead of changing parameter to hide it.
>>
>> One solution would be on client side, to activate some logging to show
>> the CQL3 statements the application is issuing that may overload the
>> server.  I know that's better said than done but I don't have any other
>> idea for the moment
>>
>> -------- Shameless self-promotion ------
>>
>> To support this kind of live prod debugging & investigation that I add a
>> new dynamic query logging feature in Achilles:
>> https://github.com/doanduyhai/Achilles/wiki/Statements-Logging-and-Tracing#dynamic-statements-logging
>>
>> Once you hit the issue, this kind of feature may save your day...
>>
>>
>>
>>
>>
>>
>> On Thu, Jul 24, 2014 at 9:22 PM, Keith Wright <kwright@nanigans.com>
>> wrote:
>>
>>> I can see from cfhistograms that I do have some wide rows (see below).
>>> I set trace probability as you suggested but the output doesn’t appear to
>>> tell me what row was actually read unless I missed something.  I just see
>>> executing prepared statement.   Any ideas how I can find the row in
>>> question?
>>>
>>> I am considering reducing read_request_timeout_in_ms: 5000 in
>>> cassandra.yaml so that it reduces the impact when this occurs.
>>>
>>> Any help in identifying my issue would be GREATLY appreciated
>>>
>>>
>>> Cell Count per Partition
>>>
>>>     1 cells: 50449950
>>>
>>>     2 cells: 14281828
>>>
>>>     3 cells: 8093366
>>>
>>>     4 cells: 5029200
>>>
>>>     5 cells: 3103023
>>>
>>>     6 cells: 3059903
>>>
>>>     7 cells: 1903018
>>>
>>>     8 cells: 1509297
>>>
>>>    10 cells: 2420359
>>>
>>>    12 cells: 1624895
>>>
>>>    14 cells: 1171678
>>>
>>>    17 cells: 1289391
>>>
>>>    20 cells: 909777
>>>
>>>    24 cells: 852081
>>>
>>>    29 cells: 722925
>>>
>>>    35 cells: 587067
>>>
>>>    42 cells: 459473
>>>
>>>    50 cells: 358744
>>>
>>>    60 cells: 304146
>>>
>>>    72 cells: 244682
>>>
>>>    86 cells: 191045
>>>
>>>   103 cells: 155337
>>>
>>>   124 cells: 127061
>>>
>>>   149 cells: 98913
>>>
>>>   179 cells: 77454
>>>
>>>   215 cells: 59849
>>>
>>>   258 cells: 46117
>>>
>>>   310 cells: 35321
>>>
>>>   372 cells: 26319
>>>
>>>   446 cells: 19379
>>>
>>>   535 cells: 13783
>>>
>>>   642 cells: 9993
>>>
>>>   770 cells: 6973
>>>
>>>   924 cells: 4713
>>>
>>> 1109 cells: 3229
>>>
>>> 1331 cells: 2062
>>>
>>> 1597 cells: 1338
>>>
>>> 1916 cells: 773
>>>
>>> 2299 cells: 495
>>>
>>> 2759 cells: 268
>>>
>>> 3311 cells: 150
>>>
>>> 3973 cells: 100
>>>
>>> 4768 cells: 42
>>>
>>> 5722 cells: 24
>>>
>>> 6866 cells: 12
>>>
>>> 8239 cells: 9
>>>
>>> 9887 cells: 3
>>>
>>> 11864 cells: 0
>>>
>>> 14237 cells: 5
>>>
>>> 17084 cells: 1
>>>
>>> 20501 cells: 0
>>>
>>> 24601 cells: 2
>>>
>>> 29521 cells: 0
>>>
>>> 35425 cells: 0
>>>
>>> 42510 cells: 0
>>>
>>> 51012 cells: 0
>>>
>>> 61214 cells: 2
>>>
>>> From: DuyHai Doan <doanduyhai@gmail.com>
>>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>> Date: Thursday, July 24, 2014 at 3:01 PM
>>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>> Subject: Re: Hot, large row
>>>
>>> "How can I detect wide rows?" -->
>>>
>>> nodetool cfhistograms <keyspace> <suspected column family>
>>>
>>> Look at column "Column count" (last column) and identify a line in this
>>> column with very high value of "Offset". In a well designed application you
>>> should have a gaussian distribution where 80% of your row have a similar
>>> number of columns.
>>>
>>> "Anyone know what debug level I can set so that I can see what reads the
>>> hot node is handling?  " -->
>>>
>>> "nodetool settraceprobability <value>",  where value is a small number
>>> (0.001) on the node where you encounter the issue. Activate the tracing for
>>> a while (5 mins) and deactivate it (value = 0). Then look into
>>> system_traces tables "events" & "sessions". It may help or not since the
>>> tracing is done once every 1000.
>>>
>>> "Any way to get the server to blacklist these wide rows automatically?"
>>> --> No
>>>
>>>
>>> On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kwright@nanigans.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>>    We are seeing an issue where basically daily one of our nodes spikes
>>>> in load and is churning in CMS heap pressure.  It appears that reads are
>>>> backing up and my guess is that our application is reading a large row
>>>> repeatedly.  Our write structure can lead itself to wide rows very
>>>> infrequently (<0.001%) and we do our best to detect and delete them but
>>>> obviously we’re missing a case.  Hoping for assistance on the following
>>>> questions:
>>>>
>>>>    - How can I detect wide rows?
>>>>    - Anyone know what debug level I can set so that I can see what
>>>>    reads the hot node is handling?  I’m hoping to see the “bad” row
>>>>    - Any way to get the server to blacklist these wide rows
>>>>    automatically?
>>>>
>>>> We’re using C* 2.0.6 with Vnodes.
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>>
>
>

Mime
View raw message