cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
Date Fri, 11 Sep 2015 16:49:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741107#comment-14741107
] 

Ariel Weisberg edited comment on CASSANDRA-9738 at 9/11/15 4:49 PM:
--------------------------------------------------------------------

When deserializing an old format IndexedEntry I think you have to rewrite it to generate the
offsets. Otherwise the cache will never be populated with an entry where the offsets are calculated.
That will make it slower than the older version. I also think this will be faster than generating
it incrementally since it's a nice tight loop doing a scan of memory. If you are doing a binary
search you will end up doing most of that work anyways and if it's a scan you will also end
up doing most of it.

RowIndexEntry.java line 423, legacyIndexInfoSearch. It's still doing a loop from the beginning
of the offsets to the last offset it calculated. There is no need we should know the last
calculated offset available and skip to it. For a scan this operation becomes n^2 with that
loop. I think it should go away completely. Just rewrite the IndexedEntry during deserialization
since you are making a copy anyways when you bring it out of the field.

Jonathan told me the expectation is that people run upgrade sstables so we don't need to be
heroic. Let's go for the simples possible solution which is making the old and new formats
match after deserialization. Hopefully this means we can remove a bunch of paths based in
which format we are looking at.

For cache hits we have to copy the entire IndexedEntry onto the heap into unpooled memory.
That is making an operation that was lg N a linear operation to the size of the IndexedEntry.
In terms of raw speed the on heap cache is going to be better off using the new serialization,
but it will really poke the garbage collector in the eye. At least with the OHC cache the
garbage is short lived.

I don't like to give people options they have to choose from, but I am more afraid of making
the product unworkable for some use case. Maybe we should allow the key cache to be selectable
for 3.0? Alternatively could you make RowIndexEntry closable and go with ref counting? I feel
like these are the two options that get us to 3.0 while minimizing regret post release.

We don't have to go all in on refcounting either. Copying it in the scan case is fine. If
we could just refcount the binary search case I think we would be OK. So the cache could provide
an accessor that is refcounted and we can use that for just the key cache in just the binary
search case.


was (Author: aweisberg):
When deserializing an old format IndexedEntry I think you have to rewrite it to generate the
offsets. Otherwise the cache will never be populated with an entry where the offsets are calculated.
That will make it slower than the older version. I also think this will be faster than generating
it incrementally since it's a nice tight loop doing a scan of memory. If you are doing a binary
search you will end up doing most of that work anyways and if it's a scan you will also end
up doing most of it.

RowIndexEntry.java line 423, legacyIndexInfoSearch. It's still doing a loop from the beginning
of the offsets to the last offset it calculated. There is no need we should know the last
calculated offset available and skip to it. For a scan this operation becomes n^2 with that
loop. I think it should go away completely. Just rewrite the IndexedEntry during deserialization
since you are making a copy anyways when you bring it out of the field.

Jonathan told me the expectation is that people run upgrade sstables so we don't need to be
heroic. Let's go for the simples possible solution which is making the old and new formats
match after deserialization. Hopefully this means we can remove a bunch of paths based in
which format we are looking at.

For cache hits we have to copy the entire IndexedEntry onto the heap into unpooled memory.
That is making an operation that was lg N a linear operation to the size of the IndexedEntry.
In terms of raw speed the on heap cache is going to be better off using the new serialization,
but it will really poke the garbage collector in the eye. At least with the OHC cache the
garbage is short lived.

I don't like to give people options they have to choose from, but I am more afraid of making
the product unworkable for some use case. Maybe we should allow the key cache to be selectable
for 3.0? Alternatively could you make RowIndexEntry closable and go with ref counting? I feel
like these are the two options that get us to 3.0 while minimizing regret post release.

> Migrate key-cache to be fully off-heap
> --------------------------------------
>
>                 Key: CASSANDRA-9738
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Robert Stupp
>            Assignee: Robert Stupp
>             Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable
now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure off-heap counter
cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message