cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
Date Fri, 03 Jan 2014 20:40:52 GMT


Benedict commented on CASSANDRA-5549:

I have a patch available for this [here|]

I've been a little reticent to post it, as it's a bit of a monster of a patch, but I think
I've now done my best to keep it well commented and mostly limit unnecessary changes. There
are some changes that may appear over engineered for their current use, but I am using these
in a continuation of this patch for off-heap memtables. I'll describe some of these below,
but unpicking still useful changes seemed wasteful. If they get in the way of review we can
revisit that decision.

There are several main areas of updates:

1) Removal of switchLock itself: The main work here is actually in the OpOrdering synchronisation
class. This class explains itself, so I won't go into detail here, but provides an easy mechanism
for ensuring we can coordinate our updates to Memtables so that we know what CL position they
contain data to, and to know when the memtable is safe to be written to disk. The actual flushing
of the memtable has been refactored a little also, to keep ordering guarantees.

2) Allocators and Memory Management: by removing the switch lock, we get rid of our ability
to control heap growth by row mutations. To fix this, I've created the concept of a PoolAllocator,
with associated Pool that has fixed memory limits. Any allocation requires the pool to allot
room from its limit to the allocator (this is dealt with by MemoryTracker and MemoryOwner).
This required a lot of minor modifications all over the place, to make measurement of object
sizes at modification time cheap and accurate. Mostly I've achieved this by modifying jamm
- a new branch is [here|] so that it will
always give us a useful answer. Wherever we used to be using ObjectSizes adhoc in a class
(generally incorrectly it turns out, not unsurprisingly as the API isn't obvious) I now *always*
call measure() on an instance of the object and store that in a static field, and use simpler
methods for any dynamic space use.

Worth noting: I've renamed IMeasureableMemory.memorySize() to excessHeapSize(), and I've modified
(where applicable) its value to only count data we wouldn't otherwise be storing. This only
makes a difference in a few places, but I think is an important distinction.

This change also makes any limit on flush queue size irrelevant, so the metric we use for
controlling flushing is instead a ratio of in-use-memory to memory-limit, ignoring any already
flushing data, which once breached will trigger a flush of the largest CFS.

3) Some concurrency primitives: NonBlockingQueue (and related classes) and WaitQueue. NonBlockingQueue
is used more extensively in the off heap changes, but I leave it in here because it improves
WaitQueue a lot, and we rely on WaitQueue much more with the proliferation of the OpOrdering
operations. It helps us move much closer to completely non-blocking read/write operations
also. We also use it to get rid of the Thread.yield() in SlabAllocator. I've aimed to keep
NBQ as simple as possible.

4) CommitLog has been updated to use OpOrdering, and also includes a bug fix. I considered
splitting this into a separate ticket, but it's such a tiny proportion of the overall changes
I'm not sure it warrants it. The bug fix we may want to split out if this takes a while to
go through.

> Remove Table.switchLock
> -----------------------
>                 Key: CASSANDRA-5549
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 2.1
>         Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png
> As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write path. 
ReentrantReadWriteLock is not lightweight, even if there is no contention per se between readers
and writers of the lock (in Cassandra, memtable updates and switches).

This message was sent by Atlassian JIRA

View raw message