cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-1955) memtable flushing can block writes due to queue size limitations even though overall write throughput is below capacity
Date Sun, 09 Jan 2011 01:37:45 GMT


Stu Hood commented on CASSANDRA-1955:

> flushwriter blocking is a feature, otherwise you OOM.
That is assuming that the memtables are large: the user had a large number of CFs, and so
each memtable would be relatively small.

> memtable flushing can block writes due to queue size limitations even though overall
write throughput is below capacity
> -----------------------------------------------------------------------------------------------------------------------
>                 Key: CASSANDRA-1955
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Peter Schuller
> It seems that someone ran into this (see cassandra-user thread "Question re: the use
of multiple ColumnFamilies").
> If my interpretation is correct, the queue size is set to the concurrency in the case
of the flushSorter, and set to memtable_flush_writers in the case of flushWriter in ColumnFamilyStore.
> While the choice of concurrency for the two executors makes perfect sense, the queue
sizing does not. As a user, I would expect, and did expect, that for a given memtable independently
tuned (w.r.t. flushing thresholds etc), writes to the CF would not block until there is at
least one other memtable *for that CF* waiting to be flushed.
> With the current behavior, if I am not misinterpreting, whether or not writes will inappropriately
block is very much dependent on not just the overall write throughput, but also the incidental
timing of memtable flushes across multiple column families.
> The simplest way to mitigate (but not fix) this is probably to set the queue size to
be equal to the number of column families if that is higher than the number of CPU cores.
But that is only a mitigation because nothing prevents e.g. a large number of memtable flushes
for a small column family under temporary write load, can still block a large (possibly more
important) memtable flush for another CF. Such a shared-but-larger queue would also not prevent
heap usage spikes resulting from some a single cf with very large memtable thresholds being
rapidly written to, with a queue sized for lots of cf:s that are in practice not used. In
other words, this mitigation technique would effectively negate the backpressure mechanism
in some cases and likely lead to more people having OOM issues when saturating a CF with writes.
> A more involved change is to make each CF have it's own queue through which flushes go
prior to being submitted to flushSorter, which would guarantee that at least one memtable
can always be in pending flush state for a given CF. The global queue could effectively have
size 1 hard-coded since the queue is no longer really used as if it were a queue.
> The flushWriter is unaffected since it is a separate concern that is supposed to be I/O
bound. The current behavior would not be perfect if there is a huge discrepancy between memtable
flush thresholds of different memtables, but it does not seem high priority to make a change
here in practice.
> So, I propose either:
> (a) changing the flushSorter queue size to be max(num cores, num cfs)
> (b) creating a per-cf queue
> I'll volunteer to work on it as a nice bite sized change, assuming there is agreement
on what needs to be done. Given the concerns with (a), I think (b) is the right solution unless
it turns out to cause major complexity. Worth noting is that these are not performance sensitive
given the low frequency of memtable flushes, so an extra queue:ing step should not be an issue.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message