If you run iostat without output every few second, is the I/O stable or do
you see very uneven I/O?
Regards,
Terje
On Tue, Jun 7, 2011 at 11:12 AM, aaron morton <aaron@thelastpickle.com>wrote:
> There is a big IO queue and reads are spending a lot of time in the queue.
>
> Some more questions:
> - what version are you on ?
> - what is the concurrent_reads config setting ?
> - what is nodetool tpstats showing during the slow down ?
> - exactly how much data are you asking for ? how many rows and what sort of
> slice
> - has their been a lot of deletes or TTL columns used ?
>
> Hope that helps.
> Aaron
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 7 Jun 2011, at 10:09, Philippe wrote:
>
> Ok, here it goes again... No swapping at all...
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id
> wa
> 1 63 32044 88736 37996 7116524 0 0 227156 0 18314 5607 30 5
> 11 53
> 1 63 32044 90844 37996 7103904 0 0 233524 202 17418 4977 29 4
> 9 58
> 0 42 32044 91304 37996 7123884 0 0 249736 0 16197 5433 19 6
> 3 72
> 3 25 32044 89864 37996 7135980 0 0 223140 16 18135 7567 32 5
> 11 52
> 1 1 32044 88664 37996 7150728 0 0 229416 128 19168 7554 36 4
> 10 51
> 4 0 32044 89464 37996 7149428 0 0 213852 18 21041 8819 45 5
> 12 38
> 4 0 32044 90372 37996 7149432 0 0 233086 142 19909 7041 43 5
> 10 41
> 7 1 32044 89752 37996 7149520 0 0 206906 0 19350 6875 50 4
> 11 35
>
> Lots and lots of disk activity
> iostat -dmx 2
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> avgqu-sz await r_await w_await svctm %util
> sda 52.50 0.00 7813.00 0.00 108.01 0.00 28.31
> 117.15 14.89 14.89 0.00 0.11 83.00
> sdb 56.00 0.00 7755.50 0.00 108.51 0.00 28.66
> 118.67 15.18 15.18 0.00 0.11 82.80
> md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00
> md5 0.00 0.00 15796.50 0.00 219.21 0.00 28.42
> 0.00 0.00 0.00 0.00 0.00 0.00
> dm-0 0.00 0.00 15796.50 0.00 219.21 0.00 28.42
> 273.42 17.03 17.03 0.00 0.05 83.40
> dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00
>
> More info :
> - all the data directory containing the data I'm querying into is 9.7GB
> and this is a server with 16GB
> - I'm hitting the server with 6 concurrent multigetsuperslicequeries on
> multiple keys, some of them can bring back quite a number of data
> - I'm reading all the keys for one column, pretty much sequentially
>
> This is a query in a rollup table that was originally in MySQL and it
> doesn't look like the performance to query by key is better. So I'm betting
> I'm doing something wrong here... but what ?
>
> Any ideas ?
> Thanks
>
> 2011/6/6 Philippe <watcherfr@gmail.com>
>
>> hum..no, it wasn't swapping. cassandra was the only thing running on that
>> server
>> and i was querying the same keys over and over
>>
>> i restarted Cassandra and doing the same thing, io is now down to zero
>> while cpu is up which dosen't surprise me as much.
>>
>> I'll report if it happens again.
>> Le 5 juin 2011 16:55, "Jonathan Ellis" <jbellis@gmail.com> a écrit :
>>
>> > You may be swapping.
>> >
>> > http://spyced.blogspot.com/2010/01/linux-performance-basics.html
>> > explains how to check this as well as how to see what threads are busy
>> > in the Java process.
>> >
>> > On Sat, Jun 4, 2011 at 5:34 PM, Philippe <watcherfr@gmail.com> wrote:
>> >> Hello,
>> >> I am evaluating using cassandra and I'm running into some strange IO
>> >> behavior that I can't explain, I'd like some help/ideas to troubleshoot
>> it.
>> >> I am running a 1 node cluster with a keyspace consisting of two columns
>> >> families, one of which has dozens of supercolumns itself containing
>> dozens
>> >> of columns.
>> >> All in all, this is a couple gigabytes of data, 12GB on the hard drive.
>> >> The hardware is pretty good : 16GB memory + RAID-0 SSD drives with LVM
>> and
>> >> an i5 processor (4 cores).
>> >> Keyspace: xxxxxxxxxxxxxxxxxxx
>> >> Read Count: 460754852
>> >> Read Latency: 1.108205793092766 ms.
>> >> Write Count: 30620665
>> >> Write Latency: 0.01411020877567486 ms.
>> >> Pending Tasks: 0
>> >> Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx
>> >> SSTable count: 5
>> >> Space used (live): 548700725
>> >> Space used (total): 548700725
>> >> Memtable Columns Count: 0
>> >> Memtable Data Size: 0
>> >> Memtable Switch Count: 11
>> >> Read Count: 2891192
>> >> Read Latency: NaN ms.
>> >> Write Count: 3157547
>> >> Write Latency: NaN ms.
>> >> Pending Tasks: 0
>> >> Key cache capacity: 367396
>> >> Key cache size: 367396
>> >> Key cache hit rate: NaN
>> >> Row cache capacity: 112683
>> >> Row cache size: 112683
>> >> Row cache hit rate: NaN
>> >> Compacted row minimum size: 125
>> >> Compacted row maximum size: 924
>> >> Compacted row mean size: 172
>> >> Column Family: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
>> >> SSTable count: 7
>> >> Space used (live): 8707538781
>> >> Space used (total): 8707538781
>> >> Memtable Columns Count: 0
>> >> Memtable Data Size: 0
>> >> Memtable Switch Count: 30
>> >> Read Count: 457863660
>> >> Read Latency: 2.381 ms.
>> >> Write Count: 27463118
>> >> Write Latency: NaN ms.
>> >> Pending Tasks: 0
>> >> Key cache capacity: 4518387
>> >> Key cache size: 4518387
>> >> Key cache hit rate: 0.9247881700850826
>> >> Row cache capacity: 1349682
>> >> Row cache size: 1349682
>> >> Row cache hit rate: 0.39400533823415573
>> >> Compacted row minimum size: 125
>> >> Compacted row maximum size: 6866
>> >> Compacted row mean size: 165
>> >> My app makes a bunch of requests using a MultigetSuperSliceQuery for a
>> set
>> >> of keys, typically a couple dozen at most. It also selects a subset of
>> the
>> >> supercolumns. I am running 8 requests in parallel at most.
>> >>
>> >> Two days, I ran a 1.5 hour process that basically read every key. The
>> server
>> >> had no IOwaits and everything was humming along. However, right at the
>> end
>> >> of the process, there was a huge spike in IOs. I didn't think much of
>> it.
>> >> Today, after two days of inactivity, any query I run raises the IOs to
>> 80%
>> >> utilization of the SSD drives even though I'm running the same query
>> over
>> >> and over (no cache??)
>> >> Any ideas on how to troubleshoot this, or better, how to solve this ?
>> >> thanks
>> >> Philippe
>> >
>> >
>> >
>> > --
>> > Jonathan Ellis
>> > Project Chair, Apache Cassandra
>> > co-founder of DataStax, the source for professional Cassandra support
>> > http://www.datastax.com
>>
>
>
>
|