incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: MUTATION messages dropped
Date Tue, 31 Dec 2013 01:44:55 GMT
> I ended up changing memtable_flush_queue_size to be large enough to contain the biggest
flood I saw.
As part of the flush process the “Switch Lock” is taken to synchronise around the commit
log. This is a reentrant Read Write lock, the flush path takes the write lock and write path
takes the read part. When flushing a CF the write lock is taken, the commit log is updated,
and memtable is added to the flush queue. If the queue is full then the write lock will be
held blocking the write threads from taking the read lock. 

There are a few reasons why the queue may be full, the simple one is the disk IO is not fast
enough. Others are that the commit log segments are too small, there are lots of CF’s and/or
lots of secondary indexes, or nodetoo flush is called frequently. 

Increasing the size of the queue is a good work around, and the correct approach if you have
a lot of CF’s and/or secondary indexes. 

Hope that helps.


-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/12/2013, at 6:03 am, Ken Hancock <ken.hancock@schange.com> wrote:

> I ended up changing memtable_flush_queue_size to be large enough to contain the biggest
flood I saw.
> 
> I monitored tpstats over time using a collection script and an analysis script that I
wrote to figure out what my largest peaks were.  In my case, all my mutation drops correlated
with hitting the maximum memtable_flush_queue_size and then mutations drops stopped as soon
as the queue size dropped below the max.
> 
> I threw the scripts up on github in case they're useful...
> 
> https://github.com/hancockks/tpstats
> 
> 
> 
> 
> On Fri, Dec 20, 2013 at 1:08 AM, Alexander Shutyaev <shutyaev@gmail.com> wrote:
> Thanks for you answers.
> 
> srmore,
> 
> We are using v2.0.0. As for GC I guess it does not correlate in our case, because we
had cassandra running 9 days under production load and no dropped messages and I guess that
during this time there were a lot of GCs.
> 
> Ken,
> 
> I've checked the values you indicated. Here they are:
> 
> node1     6498
> node2     6476
> node3     6642
> 
> I guess this is not good :) What can we do to fix this problem?
> 
> 
> 2013/12/19 Ken Hancock <ken.hancock@schange.com>
> We had issues where the number of CF families that were being flushed would align and
then block writes for a very brief period. If that happened when a bunch of writes came in,
we'd see a spike in Mutation drops.
> 
> Check nodetool tpstats for FlushWriter all time blocked.
> 
> 
> On Thu, Dec 19, 2013 at 7:12 AM, Alexander Shutyaev <shutyaev@gmail.com> wrote:
> Hi all!
> 
> We've had a problem with cassandra recently. We had 2 one-minute periods when we got
a lot of timeouts on the client side (the only timeouts during 9 days we are using cassandra
in production). In the logs we've found corresponding messages saying something about MUTATION
messages dropped.
> 
> Now, the official faq [1] says that this is an indicator that the load is too high. We've
checked our monitoring and found out that 1-minute average cpu load had a local peak at the
time of the problem, but it was like 0.8 against 0.2 usual which I guess is nothing for a
2 core virtual machine. We've also checked java threads - there was no peak there and their
count was reasonable ~240-250.
> 
> Can anyone give us a hint - what should we monitor to see this "high load" and what should
we tune to make it acceptable?
> 
> Thanks in advance,
> Alexander
> 
> [1] http://wiki.apache.org/cassandra/FAQ#dropped_messages
> 
> 
> 
> -- 
> Ken Hancock | System Architect, Advanced Advertising 
> SeaChange International 
> 50 Nagog Park
> Acton, Massachusetts 01720
> ken.hancock@schange.com | www.schange.com | NASDAQ:SEAC 
> Office: +1 (978) 889-3329 |  ken.hancock@schange.com | hancockks | hancockks	
> 
> 
> This e-mail and any attachments may contain information which is SeaChange International
confidential. The information enclosed is intended only for the addressees herein and may
not be copied or forwarded without permission from SeaChange International.
> 
> 
> 
> 
> -- 
> Ken Hancock | System Architect, Advanced Advertising 
> SeaChange International 
> 50 Nagog Park
> Acton, Massachusetts 01720
> ken.hancock@schange.com | www.schange.com | NASDAQ:SEAC 
> Office: +1 (978) 889-3329 |  ken.hancock@schange.com | hancockks | hancockks	
> 
> 
> This e-mail and any attachments may contain information which is SeaChange International
confidential. The information enclosed is intended only for the addressees herein and may
not be copied or forwarded without permission from SeaChange International.


Mime
View raw message