cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
Date Thu, 10 Sep 2015 20:56:51 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739576#comment-14739576
] 

Ariel Weisberg commented on CASSANDRA-9738:
-------------------------------------------

Thanks to Benedict for providing some clarity on this.

So there are two general access paths to an Indexed RIE, there is a scan and a binary search
to support random access. For the scan it is fine to materialize an entire IndexInfo. For
the binary search case we don't want to materialize an IndexInfo object as this would hurt
performance compared to the current POJO implementation.

The current code has cases where it access fields from the IndexInfo by index. I would like
to get away from that and just return the POJO and access fields from the POJO. As far as
we know there is no degenerate case where it's pulling all the fields from different indexes
interleaved.

We are getting a big win from this compared to the POJO implementation simply by reducing
the cost of loading/unloading an IndexedEntry to a memory copy, as well as reducing the cost
of building an IndexedEntry by serializing it up front instead of building a list of POJOs
and then coming back and serializing it.

We should be able to preserve the use of vints. We should optimize the layout of an IndexInfo
by having the clustering prefix field as the first field so that binary search doesn't have
to do extra decoding. During a scan the cost of materializing and extra decoding (which we
can avoid later if we want) is small compared to total operation cost for each entry materialized.

Another optimization in addition to vints (and Benedict we didn't talk about this in the hangout)
was dropping the offset field from IndexInfo.
[This was then recalculated in AbstractSSTableIterator.IndexState|https://github.com/apache/cassandra/compare/trunk...pcmanus:10232#diff-fb1874f891c1a014fb57f8b4e42b5247R431].
I don't see a conflict between 9738 and this choice, but now I am questioning this on the
grounds that it requires walking the entire partition index and doing work even to random
access? I didn't pick up on that in my review of 10232. We have also agreed that because the
IndexedEntries are sampling (and not per row or per partition) they are not as size constrained
so keeping the field seems like the right choice.

The last optimization from 10232 I want to consider is using the 64k WIDTH_BASE to reduce
the size of the offset field. I don't see why we can't preserve that.

We also want to keep the [reduction in serializer allocations|https://github.com/apache/cassandra/commit/4dfbba680620fef985cb2b3f00456ee8155404e0].
I checked and at it looks like that has been preserved.





> Migrate key-cache to be fully off-heap
> --------------------------------------
>
>                 Key: CASSANDRA-9738
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Robert Stupp
>            Assignee: Robert Stupp
>             Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable
now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure off-heap counter
cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message