cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6689) Partially Off Heap Memtables
Date Thu, 06 Mar 2014 21:07:44 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923029#comment-13923029
] 

Benedict edited comment on CASSANDRA-6689 at 3/6/14 9:07 PM:
-------------------------------------------------------------

bq. Is there a branch/patch to see all of the changes involved?

Yes, [offheap2c+6781|https://github.com/belliottsmith/cassandra/tree/offheap2c+6781] and [offheap2c|https://github.com/belliottsmith/cassandra/tree/offheap2c+6781]

The former includes performance enhancements for DirectByteBuffer use, from a separate ticket.

bq.  I'm just saying that all of the read/write requests are short lived enough to stay inside
young generation region
Well, yes and no. We have to wait until any client finishes processing the data, so there's
no absolute guarantee they'll not survive. But either way, ParNew pauses are almost as bad
as full GC, only they happen much more often. 300ms pauses are not a good thing, and if we
can then reduce the size of YG so that when these pauses do happen they're shorter (or, say,
maybe even use G1GC) then that's even better.

bq. Sorry but I still don't get it, do you mean lock-free/non-blocking or that it does no
syscalls or something similar? But that doesn't matter for pauses as much as allocation throughput
and fragmentation of Java GC.

I mean no worker threads have to stop in order for memory to be reclaimed. There is no STW
for reclaiming memory. This is unrelated to the lock freedom.

bq. Java is the worst of cache locality with it's object placement anyway 

Well, exactly. That's my point here :-)

bq.  it adds another complications not related to this very ticket.

I have no intention of doing it in this ticket, just indicating it as a very useful improvement.

bq. which is how we usually do things for Cassandra project

It looks like you really have two steps, if I boil it down: 1) implement copying approach;
2) if slow, implement this approach? 

The issue with this, really, though, is that if we simply allocate ByteBuffers off-heap like
we have in this ticket, we get none of the benefit of increased memtable capacity, since the
on-heap overheads are still huge. Since that's one of the main goals here, it seems problematic
to lose out on it - we don't need to test to find out what the result would be. This ticket
was supposed to only be a stepping stone. 

Possibly we could scale CASSANDRA-6694 back somewhat, removing any support for RefAction.refer(),
and always performing a copy onto heap from the NativeCell implementations, and spending some
time ripping out any of the GC or any of the code at Memtable discard for coping with RefAction.refer().
But honestly this seems like a waste of effort to me, as the majority of the code would remain,
we'll just not have as good a final solution. But it could be done if that is the community
preference. We could probably split it up into further commits, but each commit adds the potential
for more errors in my book, when we have a good solution that is ready to go.

Maximally separated timeline for separated commits of 6694 would be:
# introduce concurrency primitives
# introduce .memory and .data refactor, but only for ByteBuffer allocators
# introduce all .data.Native* implementations and a cut down native allocator, using OpOrder
to guard copying like we currently do for referencing (we need to use something, and it is
simpler than ref counting or anything else)
# introduce RefAction
# introduce GC

I would rather not split it up as, as I say, each new patch is an opportunity to mess up,
but it could be done. We can do performance testing to our hearts content at each stage, although
personally I think such testing would not be sufficient to demonstrate no benefit to the current
approach, as even with little benefit seen it presupposes no future performance benefit is
capped by the slower solution. So I would push for the final patch anyway. That said, I would
be surprised if we did not see any improvement by comparison.


was (Author: benedict):
bq. Is there a branch/patch to see all of the changes involved?

Yes, [offheap2c+6781|https://github.com/belliottsmith/cassandra/tree/offheap2c+6781] and [offheap2c|https://github.com/belliottsmith/cassandra/tree/offheap2c+6781]

The former includes performance enhancements for DirectByteBuffer use, from a separate ticket.

bq.  I'm just saying that all of the read/write requests are short lived enough to stay inside
young generation region
Well, yes and no. We have to wait until any client finishes processing the data, so there's
no absolute guarantee they'll not survive. But either way, ParNew pauses are almost as bad
as full GC, only they happen much more often. 300ms pauses are not a good thing, and if we
can then reduce the size of YG so that when these pauses do happen they're shorter (or, say,
maybe even use G1GC) then that's even better.

bq. Sorry but I still don't get it, do you mean lock-free/non-blocking or that it does no
syscalls or something similar? But that doesn't matter for pauses as much as allocation throughput
and fragmentation of Java GC.

I mean no worker threads have to stop in order for memory to be reclaimed. There is no STW
for reclaiming memory. This is unrelated to the lock freedom.

bq. Java is the worst of cache locality with it's object placement anyway 

Well, exactly. That's my point here :-)

bq.  it adds another complications not related to this very ticket.

I have no intention of doing it in this ticket, just indicating it as a very useful improvement.

bq. which is how we usually do things for Cassandra project

It looks like you really have two steps, if I boil it down: 1) implement copying approach;
2) if slow, implement this approach? 

The issue with this, really, though, is that if we simply allocate ByteBuffers off-heap like
we have in this ticket, we get none of the benefit of increased memtable capacity, since the
on-heap overheads are still huge. Since that's one of the main goals here, it seems problematic
to lose out on it - we don't need to test to find out what the result would be. This ticket
was supposed to only be a stepping stone. 

Possibly we could scale CASSANDRA-6694 back somewhat, removing any support for RefAction.refer(),
and always performing a copy onto heap from the NativeCell implementations, and spending some
time ripping out any of the GC or any of the code at Memtable discard for coping with RefAction.refer().
But honestly this seems like a waste of effort to me, as the majority of the code would remain,
we'll just not have as good a final solution. But it could be done if that is the community
preference. We could probably split it up into further commits, but each commit adds the potential
for more errors in my book, when we have a good solution that is ready to go.

Maximally separated timeline for separated commits of 6694 would be:
# introduce concurrency primitives
# introduce .memory and .data refactor, but only for ByteBuffer allocators, and RefAction,
but only allocateOnHeap
# introduce all .data.Native* implementations and a cut down native allocator, using OpOrder
to guard copying like we currently do for referencing (we need to use something, and it is
simpler than ref counting or anything else)
# introduce RefAction.refer(), GC, etc. (i.e. final patch)

I would rather not split it up as, as I say, each new patch is an opportunity to mess up,
but it could be done. We can do performance testing to our hearts content at each stage, although
personally I think such testing would not be sufficient to demonstrate no benefit to the current
approach, as even with little benefit seen it presupposes no future performance benefit is
capped by the slower solution. So I would push for the final patch anyway. That said, I would
be surprised if we did not see any improvement by comparison.

> Partially Off Heap Memtables
> ----------------------------
>
>                 Key: CASSANDRA-6689
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 2.1 beta2
>
>         Attachments: CASSANDRA-6689-small-changes.patch
>
>
> Move the contents of ByteBuffers off-heap for records written to a memtable.
> (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message