cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
Date Wed, 09 Apr 2014 08:14:18 GMT


Benedict commented on CASSANDRA-6694:

bq. less object overhead

There is no reduced overhead from the current patch.

bq. Also, as we consider Composite as a complete entity, storing components as contiguous
blocks would reduce container overhead + speeds up comparisons by exploiting spatial locality

You seem to be backtracking to the prior suggestion of only one implementation. I am potentially
ok with this, but see my prior comment for concerns and complications. The -1 was to having
what we have now except with an extra level of indirection (i.e. one packed Cell implementation,
and one componentised like we had before this patch). Also, I would prefer to avoid the extra
indirection +virtual method costs of having another inner object representation, within which
we need another offset.

The JVM instruction set is besides the point. The point is what hotspot will do: with a single
implementor or static method of small enough bytecode representation, it will be inlined.
Note I said "multiple implementation" virtual method. With the option you suggest we will
need an extra virtual invocation cost with every access to the underlying bytes, some extra
math to access the right location, and one extra object field reference to locate the position
we're offsetting from. These costs mount up rapidly.

Hmm. No, I now note your "client" implementation: what exactly is this one? Please clarify,
as the thrift cell is going to need to be compared with the other implementations, and suddenly
much of any benefit will disappear. The best way to make comparisons cheap and easy is to
have both sides of the comparison have at least the same layout. If we have to either virtual
invoke or instanceof check for every comparison, and a different code path for comparing each
type of representation, there will be a performance impact. As such the only main benefit
of this approach is eliminated in my eyes. Also, how will this "client" implementation achieve
its various functions, and define its type? Seems like you'll need a duplicate hierarchy still.

> Slightly More Off-Heap Memtables
> --------------------------------
>                 Key: CASSANDRA-6694
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 2.1 beta2
> The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap
overhead is still very large. It should not be tremendously difficult to extend these changes
so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their
associated overhead).
> The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per
cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This
translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the
VM to allow us to address a reasonably large memory space, although this trick is unlikely
to last us forever, at which point we will have to bite the bullet and accept a 24-byte per
cell overhead), and 4-byte object reference for maintaining our internal list of allocations,
which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph
we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting.
> The ugliest thing here is going to be implementing the various CellName instances so
that they may be backed by native memory OR heap memory.

This message was sent by Atlassian JIRA

View raw message