cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog
Date Sun, 01 Dec 2013 18:25:36 GMT


Benedict commented on CASSANDRA-3578:

bq. How is WaitQueue different from using Object.wait/notifyAll?
Monitor acquisition. For Object.wait/notifyAll, in order to either issue a notifyAll or wait,
the monitor must first be obtained. This means that the notifier can block whilst a waiter
is checking if it should wait, but also that even if the monitor is immediately released after
waking up, all waiters must still acquire the monitor one at a time (i.e. one wakes up, releases
monitor, exits, another wakes up, releases monitor, exits, etc.) meaning for N waiting threads
you have N scheduler delays. With WaitQueue the notifier "never" (in the context of this use
case never, and easily extended to absolutely never) blocks, and issues wake-ups to all waiting
threads pretty much simultaneously, all of which may in theory wake up immediately, incurring
1 scheduler delay (in reality N/cores delays)

bq. Well, in the general case of writing to more than just a single local replica
True in RF>1, but for RF=1 smart clients are hopefully becoming the norm with the Java
Driver now knocking around, so it's not a small use case. It could potentially be extended
to RF>1 as well, by having a separate queue of blocking and non-blocking stages of writing,
so that we don't issue a response until the blocking portion of the write completes, but we
can continue to service more incoming requests for the non-blocking portion. This would mean
the total cluster throughput would be limited to nodes * max_connections, instead of nodes
* concurrent_writes.

bq. At some point you have to do blocking i/o to disk so whether you call that concurrent_writes
or fake_aio_threads doesn't really matter.
But the blocking i/o isn't the rate limiting factor here; with BatchCLE the limit is the batch
period * writes per batch. We don't really want to limit writes per batch artificially if
we can avoid it. Obviously we have a limit on the number of clients that can be connected,
so that limit can't be avoided, but concurrent_writes can be and probably should be much lower,
so constraining by that is undesirable.

At some point we could eliminate the connection limit as well, by returning to the client
a request id that will be followed up later by a confirmation of write/failure. I'm not necessarily
suggesting we do this any time soon, but it is a possibility if we make the proposed change.

> Multithreaded commitlog
> -----------------------
>                 Key: CASSANDRA-3578
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Benedict
>            Priority: Minor
>              Labels: performance
>         Attachments: 0001-CASSANDRA-3578.patch,, Current-CL.png,
Multi-Threded-CL.png,, latency.svg, oprate.svg, parallel_commit_log_2.patch
> Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog
simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate)
can improve performance, since you're not bottlenecking on a single thread to do all the copying
and CRC computation.
> Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable.
> (moved from CASSANDRA-622, which was getting a bit muddled.)

This message was sent by Atlassian JIRA

View raw message