cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavel Yaskevich (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
Date Thu, 17 Apr 2014 09:28:20 GMT


Pavel Yaskevich commented on CASSANDRA-6694:

To address all of your comments this is not intended for any kind of review yet, it is just
an idea demonstration that's why I basically carried over all of the methods from original
implementations, didn't rename or move stuff. Also I'm fine if methods in both implementations
are going to return constant values like serializationFlags or isMarkedForDeleted, a part
from that there is not much of the code duplication, duplication is also going to be minimized
when hashCode and other methods go away, which would probably only leave us with dataSize
and serializedSize duplication but I guess we can come up with something clever for native
cells there too. Regarding the point about updateDigest - it's meant more like representation
of kind of things we can do if we have two different implementations of it, not optimized
for performance yet.

bq. There shouldn't be one for the time being - we can never construct one.


bq. Same reason - it doesn't exist as either or, so I made a conscious decision to leave it
as a CounterUpdateCell: the fact that it extends BufferCell is kind of unimportant. It's purpose
is somewhat different, and I think it is better left named CounterUpdateCell, as that is its
purpose (to carry a counter update as far as the memtable, and no further).

It is constructed in ColumnFamily and ColumnSerializer. If it's supposed to be only one implementation
for now let's name it appropriately and use like all other buffered cells.

bq. This brings in the namespace of the extended class' static methods, which is useful.

By why do we care and what does it give us as those interfaces are called directly and static
methods don't override each other?

bq. Sure, but again: scope of ticket, and care needs to be taken when doing this (e.g. your
updateDigest modifications)

I don't really follow what are you implying with that, the scope is introduce native implementations
as optimized as possible so why do we miss out of such low hanging fruit?...

> Slightly More Off-Heap Memtables
> --------------------------------
>                 Key: CASSANDRA-6694
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 2.1 beta2
> The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap
overhead is still very large. It should not be tremendously difficult to extend these changes
so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their
associated overhead).
> The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per
cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This
translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the
VM to allow us to address a reasonably large memory space, although this trick is unlikely
to last us forever, at which point we will have to bite the bullet and accept a 24-byte per
cell overhead), and 4-byte object reference for maintaining our internal list of allocations,
which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph
we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting.
> The ugliest thing here is going to be implementing the various CellName instances so
that they may be backed by native memory OR heap memory.

This message was sent by Atlassian JIRA

View raw message