cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables
Date Wed, 05 Mar 2014 10:27:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920720#comment-13920720
] 

Benedict commented on CASSANDRA-6689:
-------------------------------------

bq.  sort of RCU (i'm looking at you OpOrder)

What do you mean here? If you mean read-copy-update, OpOrder is nothing like this.

bq. I'm not sure what is to retain here if we do that copy when we send to the wire

Ultimately, doing this copying before sending to the wire is something I would like to avoid.
Using the RefAction.allocateOnHeap() on top of this copying sees wire transfer speeds for
thrift drop by about 10% in my fairly rough-and-ready benchmarks, so obviously copying has
a cost. Possibly this cost is due to unavoidably copying data you don't necessarily want to
serialise, but it seems to be there. Ultimately if we want to get in-memory read operations
to 10x their current performance, we can't go cutting any corners.

bq. introducing separate gc

I've stated clearly what this introduces as a benefit: overwrite workloads no longer cause
excessive flushes

bq.  things but as we have a fixed number of threads it is going to work out the same way
as for buffering open files in the steady system state

Your next sentence states how this is a large cause of memory consumption, so surely we should
be using that memory if possible for other uses (returning it to the buffer cache, or using
it internally for more caching)?

bq. Temporary memory allocated by readers is exactly what we should be managing at the first
place because they allocate the most and it always the biggest concern for us

I agree we should be moving to managing this as well, however I disagree about how we should
be managing it. In the medium term we should be bringing the buffer cache in process, so that
we can answer some queries without handing off to the mutation stage (anything known to be
non-blocking and fast should be answered immediately by the thread that processed the connection),
at which point we will benefit from shared use of the memory pool, and concrete control over
how much memory readers are using, and zero-copy reads from the buffer cache. I hope we may
be able to do this for 3.0.

bq. do a simple memcpy test and see how much mb/s can you get from copying from one pre-allocated
pool to another

Are you performing a full object tree copy, and doing this with a running system to see how
it affects the performance of other system components? If not, it doesn't seem to be a useful
comparison. Note that this will still create a tremendous amount of heap churn, as most of
the memory used by objects right now is on-heap. So copying the records is almost certainly
no better for young gen pressure than what we currently do - in fact, *it probably makes the
situation worse*.

bq. it's not the memtable which creates the most of the noise and memory presure in the system
(even tho it uses big chunk of heap) 

It may not be causing the young gen pressure you're seeing, but it certainly offers some benefit
here by keeping more rows in memory so recent queries are more likely to be answered with
zero allocation, so reducing young gen pressure; it is also a foundation for improving the
row cache and introducing a shared page cache which could bring us closer to zero allocation
reads.

It's also not clear to me how you would be managing the reclaim of the off-heap allocations
without OpOrder, or do you mean to only use off-heap buffers for readers, or to ref-count
any memory as you're reading it? Not using off-heap memory for the memtables would negate
the main original point of this ticket: to support larger memtables, thus reducing write amplification.
Ref-counting incurs overhead linear to the size of the result set, much like copying, and
is also fiddly to get right (not convinced it's cleaner or neater), whereas OpOrder incurs
overhead proportional to the number of times you reclaim. So if you're using OpOrder, all
you're really talking about is a new RefAction: copyToAllocator() or something. So it doesn't
notably reduce complexity, it just reduces the quality of the end result.


> Partially Off Heap Memtables
> ----------------------------
>
>                 Key: CASSANDRA-6689
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 2.1 beta2
>
>         Attachments: CASSANDRA-6689-small-changes.patch
>
>
> Move the contents of ByteBuffers off-heap for records written to a memtable.
> (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message