cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Graham Sanderson <gra...@vast.com>
Subject Re: BEWARE https://issues.apache.org/jira/browse/CASSANDRA-9504
Date Mon, 19 Oct 2015 16:37:43 GMT
- commitlog_sync_batch_window_in_ms behavior has changed from the
  maximum time to wait between fsync to the minimum time.  We are 
  working on making this more user-friendly (see CASSANDRA-9533) but in the
  meantime, this means 2.1 needs a much smaller batch window to keep
  writer threads from starving.  The suggested default is now 2ms.
was added retroactively to NEWS.txt in 2.1.6 which is why it is not obvious

> On Oct 19, 2015, at 11:03 AM, Michael Shuler <michael@pbandjelly.org> wrote:
> 
> On 10/19/2015 10:55 AM, Graham Sanderson wrote:
>> If you had Cassandra 2.0.x (possibly before) and upgraded to Cassandra
>> 2.1, you may have had
>> 
>> commitlog_sync: batch
>> 
>> commitlog_sync_batch_window_in_ms: 25
>> 
>> 
>> in you cassiandra.yaml
>> 
>> It turned out that this was pretty much broken in 2.0 (i.e. fsyncs just
>> happened immediately), but fixed in 2.1, *which meant that every
>> mutation blocked its writer thread for 25ms meaning at 80
>> mutations/sec/writer thread you’d start DROPPING mutations if your write
>> timeout is 2000ms.*
>> 
>> This turns out to be a massive problem if you write fast, and the
>> default commitlog_sync_batch_window_in_ms was changed to 2 ms in 2.1.6
>> as a way of addressing this (with some suggesting 1ms)
>> 
>> Neither of these changes got much fanfare except an eventual reference
>> in CHANGES.TXT
>> 
>> With 2.1.9 if you aren’t doing periodic sync, then I think the new
>> behavior is just to sync whenever the commit logs have a
>> consistent/complete set of mutations ready.
>> 
>> Note this is hard to diagnose because CPU is idle and pretty much all
>> latency metrics (except the overall coordinator write) do not count this
>> time (and you probably weren’t noticing the 25ms write ACK time). It
>> turned out for us that one of our nodes was getting more writes (> 20k
>> mutations per second) which was about the magic number… anything shy of
>> that and everything looked fine, but just by going slightly over, this
>> node was dropping lots of mutations.
> 
> If you would be kind enough to submit a patch to JIRA for NEWS.txt (aligned with the
right versions you're warning about) that includes the info upgrading users might need, that
would be great!
> 
> -- 
> Kind regards,
> Michael


Mime
View raw message