cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
Date Fri, 11 Sep 2015 19:48:48 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741107#comment-14741107
] 

Ariel Weisberg edited comment on CASSANDRA-9738 at 9/11/15 7:48 PM:
--------------------------------------------------------------------

The additions to VIntCoding are now unused

When deserializing an old format IndexedEntry I think you have to rewrite it to generate the
offsets. Otherwise the cache will never be populated with an entry where the offsets are calculated.
That will make it slower than the older version. I also think this will be faster than generating
it incrementally since it's a nice tight loop doing a scan of memory. If you are doing a binary
search you will end up doing most of that work anyways and if it's a scan you will also end
up doing most of it.

RowIndexEntry.java line 423, legacyIndexInfoSearch. It's still doing a loop from the beginning
of the offsets to the last offset it calculated. There is no need we should know the last
calculated offset available and skip to it. For a scan this operation becomes n^2 with that
loop. I think it should go away completely. Just rewrite the IndexedEntry during deserialization
since you are making a copy anyways when you bring it out of the field.

Jonathan told me the expectation is that people run upgrade sstables so we don't need to be
heroic. Let's go for the simples possible solution which is making the old and new formats
match after deserialization. Hopefully this means we can remove a bunch of paths based in
which format we are looking at.

For cache hits we have to copy the entire IndexedEntry onto the heap into unpooled memory.
That is making an operation that was lg N a linear operation to the size of the IndexedEntry.
In terms of raw speed the on heap cache is going to be better off using the new serialization,
but it will really poke the garbage collector in the eye. At least with the OHC cache the
garbage is short lived.

I don't like to give people options they have to choose from, but I am more afraid of making
the product unworkable for some use case. Maybe we should allow the key cache to be selectable
for 3.0? I feel like these are the two options that get us to 3.0 while minimizing regret
post release.

We have one week so it's less is more time.


was (Author: aweisberg):
When deserializing an old format IndexedEntry I think you have to rewrite it to generate the
offsets. Otherwise the cache will never be populated with an entry where the offsets are calculated.
That will make it slower than the older version. I also think this will be faster than generating
it incrementally since it's a nice tight loop doing a scan of memory. If you are doing a binary
search you will end up doing most of that work anyways and if it's a scan you will also end
up doing most of it.

RowIndexEntry.java line 423, legacyIndexInfoSearch. It's still doing a loop from the beginning
of the offsets to the last offset it calculated. There is no need we should know the last
calculated offset available and skip to it. For a scan this operation becomes n^2 with that
loop. I think it should go away completely. Just rewrite the IndexedEntry during deserialization
since you are making a copy anyways when you bring it out of the field.

Jonathan told me the expectation is that people run upgrade sstables so we don't need to be
heroic. Let's go for the simples possible solution which is making the old and new formats
match after deserialization. Hopefully this means we can remove a bunch of paths based in
which format we are looking at.

For cache hits we have to copy the entire IndexedEntry onto the heap into unpooled memory.
That is making an operation that was lg N a linear operation to the size of the IndexedEntry.
In terms of raw speed the on heap cache is going to be better off using the new serialization,
but it will really poke the garbage collector in the eye. At least with the OHC cache the
garbage is short lived.

I don't like to give people options they have to choose from, but I am more afraid of making
the product unworkable for some use case. Maybe we should allow the key cache to be selectable
for 3.0? I feel like these are the two options that get us to 3.0 while minimizing regret
post release.

We have one week so it's less is more time.

> Migrate key-cache to be fully off-heap
> --------------------------------------
>
>                 Key: CASSANDRA-9738
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Robert Stupp
>            Assignee: Robert Stupp
>             Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable
now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure off-heap counter
cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message