cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables
Date Tue, 18 Feb 2014 22:25:21 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904708#comment-13904708
] 

Benedict commented on CASSANDRA-6694:
-------------------------------------

Initial patch for this is available [here|https://github.com/belliottsmith/cassandra/tree/offheap2]

The basic idea is that we have made Cell and DecoratedKey both interfaces, and we have "buffer"
and "native" implementations. The native implementations squash the implementation of CellName
into the same object, so that we can avoid any allocation overhead, and so we don't need to
allocate a new object every time we read the name. As a result we have had to go a little
anti-OOP; DecoratedKey and *Cell are now interfaces, with static implementation "modules",
the methods of which are invoked by each implementation with themselves as the first parameter.
This isn't super pretty, but it isn't super ugly either. The ugliest thing here is that I
flatten the logic from db.composites all into NativeCell, but it turns out this is actually
really not very hard; they behave _mostly_ the same.

I've also quite widely refactored the stuff introduced in CASSANDRA-5549 and CASSANDRA-6689:
the PoolAllocator in utils.memory now only defines methods for managing the memory use of
the pool; what it *means* to "allocate" is now left to its descendants to define. We now split
them up into two camps: ByteBufferPool and NativePool (renamed from OffHeapPool). The formers'
allocators implement ByteBufferAllocator (formerly AbstractAllocator), whereas the NativeAllocator
allocates NativeAllocations. With me still? These NativeAllocations form the basis for any
objects stored off-heap.

Anyway, these PoolAllocators are now utilised by *Data*Allocators in the db package tree;
these are comparatively simple, and I wanted to keep the guts of the memory management in
utils.memory. These DataAllocator instances simply know how to clone DecoratedKey and Cell
instances, and also how to tidy up any unused references.

Some notes:
- This (and CASSANDRA-6689) have negative implications for Thrift at the moment, as I have
to copy any data on-heap in order to return to thrift. Unfortunately this can only easily
be rectified by modifying thrift so that we have method calls we can override in the worker
tasks, that are invoked when starting and finishing the servicing of a request.
- I've settled for a 24-byte object, as I really needed to keep some extra information on-heap.
We can definitely tighten this in the future, but I think it probably isn't worth doing at
this stage.
- As things stand, without CASSANDRA-6697 we allocate a lot of ByteBuffers temporarily, i.e.
whenever we read the constituents of the name or the contents of the cell

It would be good to get some testing resources allocated to the first and last points to see
if we should be trying to fix it. We should decide if we want CASSANDRA-6697 preferably before
we go live with a final 2.1 release. What we really need to do is run a number of tests against
schemas with a lot of composite columns, and see what effect there is on garbage collections,
and latency metrics.

> Slightly More Off-Heap Memtables
> --------------------------------
>
>                 Key: CASSANDRA-6694
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 2.1
>
>
> The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as the on-heap
overhead is still very large. It should not be tremendously difficult to extend these changes
so that we allocate entire Cells off-heap, instead of multiple BBs per Cell (with all their
associated overhead).
> The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 bytes per
cell on average for the btree overhead, for a total overhead of around 20-22 bytes). This
translates to 8-byte object overhead, 4-byte address (we will do alignment tricks like the
VM to allow us to address a reasonably large memory space, although this trick is unlikely
to last us forever, at which point we will have to bite the bullet and accept a 24-byte per
cell overhead), and 4-byte object reference for maintaining our internal list of allocations,
which is unfortunately necessary since we cannot safely (and cheaply) walk the object graph
we allocate otherwise, which is necessary for (allocation-) compaction and pointer rewriting.
> The ugliest thing here is going to be implementing the various CellName instances so
that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message