cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Troubleshooting IO performance ?
Date Tue, 07 Jun 2011 02:12:56 GMT
There is a big IO queue and reads are spending a lot of time in the queue. 

Some more questions:
- what version are you on ? 
-  what is the concurrent_reads config setting ? 
- what is nodetool tpstats showing during the slow down ? 
- exactly how much data are you asking for ? how many rows and what sort of slice 
- has their been a lot of deletes or TTL columns used ? 

Hope that helps. 
Aaron
 
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 7 Jun 2011, at 10:09, Philippe wrote:

> Ok, here it goes again... No swapping at all...
> 
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  1 63  32044  88736  37996 7116524    0    0 227156     0 18314 5607 30  5 11 53
>  1 63  32044  90844  37996 7103904    0    0 233524   202 17418 4977 29  4  9 58
>  0 42  32044  91304  37996 7123884    0    0 249736     0 16197 5433 19  6  3 72
>  3 25  32044  89864  37996 7135980    0    0 223140    16 18135 7567 32  5 11 52
>  1  1  32044  88664  37996 7150728    0    0 229416   128 19168 7554 36  4 10 51
>  4  0  32044  89464  37996 7149428    0    0 213852    18 21041 8819 45  5 12 38
>  4  0  32044  90372  37996 7149432    0    0 233086   142 19909 7041 43  5 10 41
>  7  1  32044  89752  37996 7149520    0    0 206906     0 19350 6875 50  4 11 35
> 
> Lots and lots of disk activity
> iostat -dmx 2
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz 
 await r_await w_await  svctm  %util
> sda              52.50     0.00 7813.00    0.00   108.01     0.00    28.31   117.15 
 14.89   14.89    0.00   0.11  83.00
> sdb              56.00     0.00 7755.50    0.00   108.51     0.00    28.66   118.67 
 15.18   15.18    0.00   0.11  82.80
> md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00 
  0.00    0.00    0.00   0.00   0.00
> md5               0.00     0.00 15796.50    0.00   219.21     0.00    28.42     0.00
   0.00    0.00    0.00   0.00   0.00
> dm-0              0.00     0.00 15796.50    0.00   219.21     0.00    28.42   273.42
  17.03   17.03    0.00   0.05  83.40
> dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00 
  0.00    0.00    0.00   0.00   0.00
> 
> More info : 
> - all the data directory containing the data I'm querying into is  9.7GB and this is
a server with 16GB 
> - I'm hitting the server with 6 concurrent multigetsuperslicequeries on multiple keys,
some of them can bring back quite a number of data
> - I'm reading all the keys for one column, pretty much sequentially
> 
> This is a query in a rollup table that was originally in MySQL and it doesn't look like
the performance to query by key is better. So I'm betting I'm doing something wrong here...
but what ?
> 
> Any ideas ?
> Thanks
> 
> 2011/6/6 Philippe <watcherfr@gmail.com>
> hum..no, it wasn't swapping. cassandra was the only thing running on that server
> and i was querying the same keys over and over
> 
> i restarted Cassandra and doing the same thing, io is now down to zero while cpu is up
which dosen't surprise me as much.
> 
> I'll report if it happens again.
> 
> Le 5 juin 2011 16:55, "Jonathan Ellis" <jbellis@gmail.com> a écrit :
> 
> > You may be swapping.
> > 
> > http://spyced.blogspot.com/2010/01/linux-performance-basics.html
> > explains how to check this as well as how to see what threads are busy
> > in the Java process.
> > 
> > On Sat, Jun 4, 2011 at 5:34 PM, Philippe <watcherfr@gmail.com> wrote:
> >> Hello,
> >> I am evaluating using cassandra and I'm running into some strange IO
> >> behavior that I can't explain, I'd like some help/ideas to troubleshoot it.
> >> I am running a 1 node cluster with a keyspace consisting of two columns
> >> families, one of which has dozens of supercolumns itself containing dozens
> >> of columns.
> >> All in all, this is a couple gigabytes of data, 12GB on the hard drive.
> >> The hardware is pretty good : 16GB memory + RAID-0 SSD drives with LVM and
> >> an i5 processor (4 cores).
> >> Keyspace: xxxxxxxxxxxxxxxxxxx
> >>         Read Count: 460754852
> >>         Read Latency: 1.108205793092766 ms.
> >>         Write Count: 30620665
> >>         Write Latency: 0.01411020877567486 ms.
> >>         Pending Tasks: 0
> >>                 Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx
> >>                 SSTable count: 5
> >>                 Space used (live): 548700725
> >>                 Space used (total): 548700725
> >>                 Memtable Columns Count: 0
> >>                 Memtable Data Size: 0
> >>                 Memtable Switch Count: 11
> >>                 Read Count: 2891192
> >>                 Read Latency: NaN ms.
> >>                 Write Count: 3157547
> >>                 Write Latency: NaN ms.
> >>                 Pending Tasks: 0
> >>                 Key cache capacity: 367396
> >>                 Key cache size: 367396
> >>                 Key cache hit rate: NaN
> >>                 Row cache capacity: 112683
> >>                 Row cache size: 112683
> >>                 Row cache hit rate: NaN
> >>                 Compacted row minimum size: 125
> >>                 Compacted row maximum size: 924
> >>                 Compacted row mean size: 172
> >>                 Column Family: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
> >>                 SSTable count: 7
> >>                 Space used (live): 8707538781
> >>                 Space used (total): 8707538781
> >>                 Memtable Columns Count: 0
> >>                 Memtable Data Size: 0
> >>                 Memtable Switch Count: 30
> >>                 Read Count: 457863660
> >>                 Read Latency: 2.381 ms.
> >>                 Write Count: 27463118
> >>                 Write Latency: NaN ms.
> >>                 Pending Tasks: 0
> >>                 Key cache capacity: 4518387
> >>                 Key cache size: 4518387
> >>                 Key cache hit rate: 0.9247881700850826
> >>                 Row cache capacity: 1349682
> >>                 Row cache size: 1349682
> >>                 Row cache hit rate: 0.39400533823415573
> >>                 Compacted row minimum size: 125
> >>                 Compacted row maximum size: 6866
> >>                 Compacted row mean size: 165
> >> My app makes a bunch of requests using a MultigetSuperSliceQuery for a set
> >> of keys, typically a couple dozen at most. It also selects a subset of the
> >> supercolumns. I am running 8 requests in parallel at most.
> >>
> >> Two days, I ran a 1.5 hour process that basically read every key. The server
> >> had no IOwaits and everything was humming along. However, right at the end
> >> of the process, there was a huge spike in IOs. I didn't think much of it.
> >> Today, after two days of inactivity, any query I run raises the IOs to 80%
> >> utilization of the SSD drives even though I'm running the same query over
> >> and over (no cache??)
> >> Any ideas on how to troubleshoot this, or better, how to solve this ?
> >> thanks
> >> Philippe
> > 
> > 
> > 
> > -- 
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of DataStax, the source for professional Cassandra support
> > http://www.datastax.com
> 


Mime
View raw message