cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philippe <watche...@gmail.com>
Subject Re: Troubleshooting IO performance ?
Date Tue, 07 Jun 2011 16:29:24 GMT
very even
will answer aaron's email...

will upgrade to 0.8 too !
Le 7 juin 2011 13:09, "Terje Marthinussen" <tmarthinussen@gmail.com> a
écrit :
> If you run iostat without output every few second, is the I/O stable or do
> you see very uneven I/O?
>
> Regards,
> Terje
>
> On Tue, Jun 7, 2011 at 11:12 AM, aaron morton <aaron@thelastpickle.com
>wrote:
>
>> There is a big IO queue and reads are spending a lot of time in the
queue.
>>
>> Some more questions:
>> - what version are you on ?
>> - what is the concurrent_reads config setting ?
>> - what is nodetool tpstats showing during the slow down ?
>> - exactly how much data are you asking for ? how many rows and what sort
of
>> slice
>> - has their been a lot of deletes or TTL columns used ?
>>
>> Hope that helps.
>> Aaron
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 7 Jun 2011, at 10:09, Philippe wrote:
>>
>> Ok, here it goes again... No swapping at all...
>>
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ----cpu----
>> r b swpd free buff cache si so bi bo in cs us sy id
>> wa
>> 1 63 32044 88736 37996 7116524 0 0 227156 0 18314 5607 30 5
>> 11 53
>> 1 63 32044 90844 37996 7103904 0 0 233524 202 17418 4977 29 4
>> 9 58
>> 0 42 32044 91304 37996 7123884 0 0 249736 0 16197 5433 19 6
>> 3 72
>> 3 25 32044 89864 37996 7135980 0 0 223140 16 18135 7567 32 5
>> 11 52
>> 1 1 32044 88664 37996 7150728 0 0 229416 128 19168 7554 36 4
>> 10 51
>> 4 0 32044 89464 37996 7149428 0 0 213852 18 21041 8819 45 5
>> 12 38
>> 4 0 32044 90372 37996 7149432 0 0 233086 142 19909 7041 43 5
>> 10 41
>> 7 1 32044 89752 37996 7149520 0 0 206906 0 19350 6875 50 4
>> 11 35
>>
>> Lots and lots of disk activity
>> iostat -dmx 2
>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
>> avgqu-sz await r_await w_await svctm %util
>> sda 52.50 0.00 7813.00 0.00 108.01 0.00 28.31
>> 117.15 14.89 14.89 0.00 0.11 83.00
>> sdb 56.00 0.00 7755.50 0.00 108.51 0.00 28.66
>> 118.67 15.18 15.18 0.00 0.11 82.80
>> md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>> 0.00 0.00 0.00 0.00 0.00 0.00
>> md5 0.00 0.00 15796.50 0.00 219.21 0.00 28.42
>> 0.00 0.00 0.00 0.00 0.00 0.00
>> dm-0 0.00 0.00 15796.50 0.00 219.21 0.00 28.42
>> 273.42 17.03 17.03 0.00 0.05 83.40
>> dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>> 0.00 0.00 0.00 0.00 0.00 0.00
>>
>> More info :
>> - all the data directory containing the data I'm querying into is 9.7GB
>> and this is a server with 16GB
>> - I'm hitting the server with 6 concurrent multigetsuperslicequeries on
>> multiple keys, some of them can bring back quite a number of data
>> - I'm reading all the keys for one column, pretty much sequentially
>>
>> This is a query in a rollup table that was originally in MySQL and it
>> doesn't look like the performance to query by key is better. So I'm
betting
>> I'm doing something wrong here... but what ?
>>
>> Any ideas ?
>> Thanks
>>
>> 2011/6/6 Philippe <watcherfr@gmail.com>
>>
>>> hum..no, it wasn't swapping. cassandra was the only thing running on
that
>>> server
>>> and i was querying the same keys over and over
>>>
>>> i restarted Cassandra and doing the same thing, io is now down to zero
>>> while cpu is up which dosen't surprise me as much.
>>>
>>> I'll report if it happens again.
>>> Le 5 juin 2011 16:55, "Jonathan Ellis" <jbellis@gmail.com> a écrit :
>>>
>>> > You may be swapping.
>>> >
>>> > http://spyced.blogspot.com/2010/01/linux-performance-basics.html
>>> > explains how to check this as well as how to see what threads are busy
>>> > in the Java process.
>>> >
>>> > On Sat, Jun 4, 2011 at 5:34 PM, Philippe <watcherfr@gmail.com> wrote:
>>> >> Hello,
>>> >> I am evaluating using cassandra and I'm running into some strange IO
>>> >> behavior that I can't explain, I'd like some help/ideas to
troubleshoot
>>> it.
>>> >> I am running a 1 node cluster with a keyspace consisting of two
columns
>>> >> families, one of which has dozens of supercolumns itself containing
>>> dozens
>>> >> of columns.
>>> >> All in all, this is a couple gigabytes of data, 12GB on the hard
drive.
>>> >> The hardware is pretty good : 16GB memory + RAID-0 SSD drives with
LVM
>>> and
>>> >> an i5 processor (4 cores).
>>> >> Keyspace: xxxxxxxxxxxxxxxxxxx
>>> >> Read Count: 460754852
>>> >> Read Latency: 1.108205793092766 ms.
>>> >> Write Count: 30620665
>>> >> Write Latency: 0.01411020877567486 ms.
>>> >> Pending Tasks: 0
>>> >> Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx
>>> >> SSTable count: 5
>>> >> Space used (live): 548700725
>>> >> Space used (total): 548700725
>>> >> Memtable Columns Count: 0
>>> >> Memtable Data Size: 0
>>> >> Memtable Switch Count: 11
>>> >> Read Count: 2891192
>>> >> Read Latency: NaN ms.
>>> >> Write Count: 3157547
>>> >> Write Latency: NaN ms.
>>> >> Pending Tasks: 0
>>> >> Key cache capacity: 367396
>>> >> Key cache size: 367396
>>> >> Key cache hit rate: NaN
>>> >> Row cache capacity: 112683
>>> >> Row cache size: 112683
>>> >> Row cache hit rate: NaN
>>> >> Compacted row minimum size: 125
>>> >> Compacted row maximum size: 924
>>> >> Compacted row mean size: 172
>>> >> Column Family: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
>>> >> SSTable count: 7
>>> >> Space used (live): 8707538781
>>> >> Space used (total): 8707538781
>>> >> Memtable Columns Count: 0
>>> >> Memtable Data Size: 0
>>> >> Memtable Switch Count: 30
>>> >> Read Count: 457863660
>>> >> Read Latency: 2.381 ms.
>>> >> Write Count: 27463118
>>> >> Write Latency: NaN ms.
>>> >> Pending Tasks: 0
>>> >> Key cache capacity: 4518387
>>> >> Key cache size: 4518387
>>> >> Key cache hit rate: 0.9247881700850826
>>> >> Row cache capacity: 1349682
>>> >> Row cache size: 1349682
>>> >> Row cache hit rate: 0.39400533823415573
>>> >> Compacted row minimum size: 125
>>> >> Compacted row maximum size: 6866
>>> >> Compacted row mean size: 165
>>> >> My app makes a bunch of requests using a MultigetSuperSliceQuery for
a
>>> set
>>> >> of keys, typically a couple dozen at most. It also selects a subset
of
>>> the
>>> >> supercolumns. I am running 8 requests in parallel at most.
>>> >>
>>> >> Two days, I ran a 1.5 hour process that basically read every key. The
>>> server
>>> >> had no IOwaits and everything was humming along. However, right at
the
>>> end
>>> >> of the process, there was a huge spike in IOs. I didn't think much of
>>> it.
>>> >> Today, after two days of inactivity, any query I run raises the IOs
to
>>> 80%
>>> >> utilization of the SSD drives even though I'm running the same query
>>> over
>>> >> and over (no cache??)
>>> >> Any ideas on how to troubleshoot this, or better, how to solve this
?
>>> >> thanks
>>> >> Philippe
>>> >
>>> >
>>> >
>>> > --
>>> > Jonathan Ellis
>>> > Project Chair, Apache Cassandra
>>> > co-founder of DataStax, the source for professional Cassandra support
>>> > http://www.datastax.com
>>>
>>
>>
>>

Mime
View raw message