cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michaël Figuière (Commented) (JIRA) <>
Subject [jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog
Date Mon, 30 Jan 2012 01:15:10 GMT


Michaël Figuière commented on CASSANDRA-3578:

I propose a different approach than Piotr in this patch. In this implementation, we only have
one thread to handle syncs, all the processing, that is serialization, CRC and copying the
RM into the mmap segment is handled directly in the writer threads. These threads exchange
data with the syncer thread in a non blocking way, thus the ExecutorService abstraction has
been replaced by a lighter structure.
Several components of the CL presented some challenges to implement in such a manner:

*CL Segment switch*

Switching CL segment when it's full isn't straightforward without locks. Here we use a boolean
mark the is atomically CASed by a writer thread giving him the responsibility for performing
the switch. If the mark can't be grabbed, the thread is waiting on a condition which is later
reused using stamps to avoid any ABA problem.

*Batch CL*

The Batch CL strategy is considered as a safer mode for Cassandra as it guarantee the client
that the RM is synced on disk before answering. Making the CL multithreaded, we must ensure
that we don't acknowledge a RM that is synced on disk but preceded by an unsynced RM in the
CL Segment as it would make the replaying of the RM impossible. For this reason, we track
the state of each RM processing, and mark as synced any continuous set of RM fully written
when the sync() call is executed.

Avoiding any blocking queue, we still need a way to put the writer threads on hold while the
sync is being ensured. LockSupport.park()/unpark() provides a nice way the do it without relying
on any coarse grain synchronization and avoiding any condition reuse/renewing issue.

*Periodic CL*

The Periodic CL's challenge is mostly around the throttling of the writers as here again we
don't use any synchronized queue to reduce contention. Actually here we just need "half a
blocking queue" as nothing is really added or consumed. For this reason, here we just use
an atomic counter and a empty/full condition couple. Here again, a pool of conditions and
a stamp are used to avoid the ABA problem.

*End of Segment marker*

Another point is that this implementation don't use any End of Segment marker. As we now have
several concurrent writers, it's not possible anymore to write temporary marker after an entry.
That mean that the recently committed code that fix CASSANDRA-3615 is obviously not included
in this patch.

Nevertheless, a mechanism to avoid unwanted replay of entry from recycled segment is still
required. I haven't included it in the patch as I think it's a design choice that need to
be debated but that seem straightforward to implement. The options I can see are the following:
- Fill CL segment file with 0 on recycling. Doing so avoid any problem but will typically
require a several second write on recycling that will lead to write latency hiccup.
- Include segment id in every entry. This avoid any problem as well but increase the entry
size by 8 bytes which has a cost but isn't a drama and can't be considered as spreading the
cost of the previous option over the entire CL writing.
- Salting the two checksums included in the entry with the segment id. Doing so lowers the
probability of any unwanted replay to happen to a level that seems fairly acceptable. The
advantage of this solution is that its performance cost is null.

Finally, here are some noteworthy observations:
* Here the writer thread WAITS for the processing to complete. Compared to a _push-on-queue-and-forget_
approach, this slightly increases write latency when using the Periodic CL (the Batch CL still
being synchronous) especially for large RMs. Nevertheless, in a highly loaded server, the
next writes waiting to be executed would have to wait anyway for their thread to be scheduled,
thus the latency cost might eventually be paid. Increasing the number of writer thread should
help to increase the insensitiveness of the small RMs to the large RMs.
* If extensive benchmarks tend to show that the previous point is an issue, there's some room
to make this Periodic CL asynchronous with the writer threads.
* To reduce as much as possible the contention on the atomic states that can be modified several
time by each thread, some naughty packing of several states within a single AtomicLong is
used as it decreases the likeliness of an extra spin to happen compare to a more classical
AtomicReference approach to non-blocking synchronization. The downside is code complexity,
thus I think AtomicReference still stay an option to make the code more readable and maintainable.

* Actually for now to ensure the required throttling of incoming RM we use a constant function
with a fixed threshold of unsynced mutation. But we now have the tools to easily make the
function more complex, like making it non constant and including some relation to the size
of the mutations for instance.

> Multithreaded commitlog
> -----------------------
>                 Key: CASSANDRA-3578
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Priority: Minor
>         Attachments: parallel_commit_log_2.patch
> Brian Aker pointed out a while ago that allowing multiple threads to modify the commitlog
simultaneously (reserving space for each with a CAS first, the way we do in the SlabAllocator.Region.allocate)
can improve performance, since you're not bottlenecking on a single thread to do all the copying
and CRC computation.
> Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes doable.
> (moved from CASSANDRA-622, which was getting a bit muddled.)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message