cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-2252) off-heap memtables
Date Thu, 14 Jul 2011 05:52:00 GMT


Jonathan Ellis updated CASSANDRA-2252:

    Attachment: 2252-v3.txt

Rebased most of Stu's latest.  Changed getLiveSize to only add in waste from the allocator
instead of double-counting the rest.  Enabled MemoryMeter.omitSharedBufferOverhead, which
is super untested.

CFS.getColumnFamily was getting passed an allocator but this doesn't actually do anything.
 (I removed the parameter.)  Was this supposed to be used during counter reconcile somehow?

Passing allocator throughout the CF+SC+[Super|Counter|Deleted|Expiring]Column heirarchy is
ugly and error-prone.  (I found and fixed one error while rebasing, where a method taking
an allocator parameter called the default addColumn, instead of the addColumn-with-allocator.)
 Perhaps moving allocator to AbstractColumnContainer could fix this?

Not thrilled with the current alternatives for moving slabs off-heap.  Our options are to

- use allocateDirect with all the problems that relying on finalization brings (see: CASSANDRA-2521),
as well as requiring users to manually tune the JVM direct buffer ceiling (or face a flood
of System.GC calls courtesy of allocateDirect when the ceiling is reached).
- use JNA + manual free, which will require doing reference counting for memtables the way
we do for sstables post-CASSANDRA-2521.  Otherwise if a thread that had the memtable in its
list of historical memtables to merge from tries to read, you segfault.  (This is NOT the
same as the JNA 179 segfaults, which are fixed in 3.3.0.)
- stick with on-heap slabs

I'd say off-heap slabs don't matter that much but it would make the promotion failure problems
you saw go away completely.

I'm also not a big fan of slabbing everything in sight.  Keys associated with memtables make
sense (and is done in my rebase).  Row key and column names during sstable build, I'm skeptical
of -- if your rows are small enough that they finish in before new -> old promotion, then
it doesn't matter.  And if they are so large they do not, then your rate of key allocation
is glacial and again it shouldn't matter.  But, if we WERE to slab these the right way to
do it would be per-sstable not per IndexSummary.

There is no logical unit of slabbing for key cache, we shouldn't be doing that at all.

I have an alternative idea to reduce non-memtable fragmentation: Adding region recycling post-flush.
 Once you promoted a slab in old gen, it stays there, instead of being GC'd and replaced with
a slab in new gen again.

(This would also mitigate the main downside of allocateDirect.)

We'd still probably want some kind of delayed release of slabs so write load spikes don't
permanently chew up your entire heap.

> off-heap memtables
> ------------------
>                 Key: CASSANDRA-2252
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>         Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt,
2252-v3.txt, merged-2252.tgz
>   Original Estimate: 0.4h
>  Remaining Estimate: 0.4h
> The memtable design practically actively fights Java's GC design.  Todd Lipcon gave a
good explanation over on HBASE-3455.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message