cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5549) Remove Table.switchLock
Date Mon, 02 Dec 2013 19:28:35 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836818#comment-13836818
] 

Benedict commented on CASSANDRA-5549:
-------------------------------------

Without switch lock, we won't have anything preventing writes coming through when we're over-burdened
with memory use by memtables.

What I'd like to suggest is effectively a global Semaphore, with permits equal to the size
allocated for memtables; on KS.apply(RM) we estimate the size of the RM and take that many
permits. Once we've added the RM and know better how much it occupies, we adjust the Semaphore
to (more) accurately reflect the amount of memory in use. When we flush a memtable we release
permits equal to the *estimated size* of each RM.

This may be pushing the boat out, but would probably result in not relying on memtable live
metering/scanning for size estimation, which we could retire. Either way we're estimating
the size, but with this approach we're keeping *tight* control over the (estimated) memory
allocated to memtables, whereas at the moment we have some tricks that we hope keep it there.
If we estimate space used cautiously, we should be able to better guarantee no OOM, at least
from this part of the code. 

I have a *reasonably* straight forward scheme for estimating size used by a RM that should
be as good as we currently have. Basic premise is to calculate average space used by an item
in ConcurrentSkipListMap using metering at startup with a map of size, say, 1M entries, rounded
up. If we depend on CASSANDRA-6271 we can easily calculate exact overhead for the BTrees,
or otherwise can do a similar metering approach for SnapTreeMap. So we have an overhead per
row and per value. Separately we track how much space we are using for a given memtable's
slab allocator. We use the RM's data size only for the initial estimation, to decide if we
have room, and ignore it once it's actually added, as it will be accounted for in the slaballocator.



> Remove Table.switchLock
> -----------------------
>
>                 Key: CASSANDRA-5549
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5549
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 2.1
>
>         Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png
>
>
> As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write path. 
ReentrantReadWriteLock is not lightweight, even if there is no contention per se between readers
and writers of the lock (in Cassandra, memtable updates and switches).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message