cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: frequent client exceptions on 0.7.0
Date Sun, 20 Feb 2011 19:34:33 GMT
AFAIK the MemtablePostFlusher is the TP writing sstables, if it has a queue then there is the
potential for writes to block while it waits for Memtables to be flushed. Take a look at your
Memtable settings per CF, could it be that all the Memtables are flushing at once? There is
info in the logs about when this happens.

One approach is to set the timeout high, so they are more likely to flush due to ops or throughput.


Aaron

On 19/02/2011, at 10:09 AM, Andy Skalet <aeskalet@bitjug.com> wrote:

> On Thu, Feb 17, 2011 at 12:22 PM, Aaron Morton <aaron@thelastpickle.com> wrote:
>> Messages been dropped means the machine node is overloaded. Look at the thread pool
stats to see which thread pools have queues. It may be IO related, so also check the read
and write latency on the CF and use iostat.
>> 
>> i would try those first, then jump into GC land.
> 
> Thanks, Aaron.  I am looking at the thread pool queues; not enough
> data on that yet but so far I've seen queues in the ReadStage from
> 4-30 (once 100) and MemtablePostFlusher as much as 70, though not consistently.
> 
> The read latencies on the CFs on this cluster are sitting around
> 20-40ms, and the write latencies are are all around .01ms.  That seems
> good to me, but I don't have a baseline.
> 
> I do see high (90-100%) utilization from time to time on the disk that
> holds the data, based on reads.  This doesn't surprise me too much
> because IO on these machines is fairly limited in performance.
> 
> Does this sound like the node is overloaded?
> 
> Andy

Mime
View raw message