cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Goffinet <goffi...@digg.com>
Subject Re: Persistently increasing read latency
Date Thu, 03 Dec 2009 19:04:34 GMT
Tim,

Very interesting information. Was there any other numbers in tpstats  
from nodeprobe that are growing?

Can you plot the number of SSTables? Are you using the standard  
storage-conf.xml defaults?

We've seen reads spike like this with a large number of SSTables.

-Chris

On Dec 3, 2009, at 10:58 AM, Freeman, Tim wrote:

> I ran another test last night with the build dated 29 Nov 2009.   
> Other than the Cassandra version, the setup was the same as before.   
> I got qualitatively similar results as before, too -- the read  
> latency increased fairly smoothly from 250ms to 1s, the GC times  
> reported by jconsole are low, the pending tasks for row-mutation- 
> stage and row-read-stage are less than 10, the pending tasks for the  
> compaction pool are 1615.  Last time around the read latency maxed  
> out at one second.  This time, it just got to one second as I'm  
> writing this so I don't know yet if it will continue to increase.
>
> I have attached a fresh graph describing the present run.  It's  
> qualitatively similar to the previous one.  The vertical units are  
> milliseconds (for latency) and operations per minute (for reads or  
> writes).  The horizontal scale is seconds.  The feature that's  
> bothering me is the red line for the read latency going diagonally  
> from lower left to the lower-middle right.  The scale doesn't make  
> it look dramatic, but Cassandra slowed down by a factor of 4.
>
> The read and write rates were stable for 45,000 seconds or so, and  
> then the read latency got big enough that the application was  
> starved for reads and it started writing less.
>
> If this is worth pursuing, I suppose the next step would be for me  
> to make a small program that reproduces the problem.  It should be  
> easy -- we're just reading and writing random records.  Let me know  
> if there's interest in that.  I could  also decide to live with a  
> 1000 ms latency here.  I'm thinking of putting a cache in the local  
> filesystem in front of Cassandra (or whichever distributed DB we  
> decide to go with), so living with it is definitely possible.
>
> Tim Freeman
> Email: tim.freeman@hp.com
> Desk in Palo Alto: (650) 857-2581
> Home: (408) 774-1298
> Cell: (408) 348-7536 (No reception business hours Monday, Tuesday,  
> and Thursday; call my desk instead.)
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbellis@gmail.com]
> Sent: Tuesday, December 01, 2009 11:10 AM
> To: cassandra-user@incubator.apache.org
> Subject: Re: Persistently increasing read latency
>
> 1) use jconsole to see what is happening to jvm / cassandra internals.
> possibly you are slowly exceeding cassandra's ability to keep up with
> writes, causing the jvm to spend more and more effort GCing to find
> enough memory to keep going
>
> 2) you should be at least on 0.4.2 and preferably trunk if you are
> stress testing
>
> -Jonathan
>
> On Tue, Dec 1, 2009 at 12:11 PM, Freeman, Tim <tim.freeman@hp.com>  
> wrote:
>> In an 8 hour test run, I've seen the read latency for Cassandra  
>> drift fairly linearly from ~460ms to ~900ms.  Eventually my  
>> application gets starved for reads and starts misbehaving.  I have  
>> attached graphs -- horizontal scales are seconds, vertical scales  
>> are operations per minute and average milliseconds per operation.   
>> The clearest feature is the light blue line in the left graph  
>> drifting consistently upward during the run.
>>
>> I have a Cassandra 0.4.1 database, one node, records are 100kbytes  
>> each, 350K records, 8 threads reading, around 700 reads per  
>> minute.  There are also 8 threads writing.  This is all happening  
>> on a 4 core processor that's supporting both the Cassandra node and  
>> the code that's generating load for it.  I'm reasonably sure that  
>> there are no page faults.
>>
>> I have attached my storage-conf.xml.  Briefly, it has default  
>> values, except RpcTimeoutInMillis is 30000 and the partitioner is  
>> OrderPreservingPartitioner.  Cassandra's garbage collection  
>> parameters are:
>>
>>   -Xms128m -Xmx1G -XX:SurvivorRatio=8 -XX:+AggressiveOpts -XX: 
>> +UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>>
>> Is this normal behavior?  Is there some change to the configuration  
>> I should make to get it to stop getting slower?  If it's not  
>> normal, what debugging information should I gather?  Should I give  
>> up on Cassandra 0.4.1 and move to a newer version?
>>
>> I'll leave it running for the time being in case there's something  
>> useful to extract from it.
>>
>> Tim Freeman
>> Email: tim.freeman@hp.com
>> Desk in Palo Alto: (650) 857-2581
>> Home: (408) 774-1298
>> Cell: (408) 348-7536 (No reception business hours Monday, Tuesday,  
>> and Thursday; call my desk instead.)
>>
>>
> <latency-trend.png>


Mime
View raw message