cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Stupp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
Date Thu, 06 Aug 2015 07:27:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659619#comment-14659619
] 

Robert Stupp commented on CASSANDRA-9738:
-----------------------------------------

I’d like to propose this patch to be included in 3.0. I hope the cstar tests are sufficient
but otherwise I can deliver more with different workloads.

h2. cstar tests

All cstar tests mentioned below perform three operations: write-only, mixed and read-only.
Unfortunately, cassandra-stress seems to reduce the really possible write throughput for workloads
with clustering keys.

All tests on this patch show reduced GC pressure (for reads, of course).
By that it gives G1 more headroom to operate and and often gains about 10-15% read perf improvement
depending on the hardware (in this case bdplab vs. blade_11_b) - bdplab (spinning disks, less
RAM) shows a bigger improvement.

h3. one big clustering key

user, native, cql3, user [profile|https://gist.github.com/snazy/b6c160c65001eb074784]

[blade_11_b|http://cstar.datastax.com/tests/id/7f7265a2-3aee-11e5-b022-42010af0688f]
[bdplab|http://cstar.datastax.com/tests/id/af344e3e-3af0-11e5-b379-42010af0688f]

h3. big clustering over two clustering columns

user, native, cql3, user [profile|https://gist.github.com/snazy/351156424929d868baf3]

[blade_11_b|http://cstar.datastax.com/tests/id/e919725a-3b68-11e5-b590-42010af0688f]
[bdplab|http://cstar.datastax.com/tests/id/36f1d0ee-3b8c-11e5-9c9e-42010af0688f]

h3. big clustering over two clustering columns, reduced threads for pure-write and mixed operations

user, native, cql3, user [profile|https://gist.github.com/snazy/e4579499f61911802fcd]

[blade_11_b|http://cstar.datastax.com/tests/id/36f1d0ee-3b8c-11e5-9c9e-42010af0688f]
[bdplab|http://cstar.datastax.com/tests/id/07754e44-3b8d-11e5-9c9e-42010af0688f]

h3. stress _write_, _mixed_, _read_

[blade_11_b|http://cstar.datastax.com/tests/id/def04c20-3b8d-11e5-9c9e-42010af0688f]
[bdplab|http://cstar.datastax.com/tests/id/f3f5c172-3b8d-11e5-9c9e-42010af0688f]

h2. Git branch + cassci

[git branch|https://github.com/snazy/cassandra/tree/9738-key-cache-ohc]
[unit tests|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-9738-key-cache-ohc-testall/]
[dtests|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-9738-key-cache-ohc-dtest/]

I didn’t see any failed tests related to this patch.

There is another branch on github as well which contains [optimizations not purely related
to key-cache|https://github.com/snazy/cassandra/tree/9738-key-cache-ref]. {{9738-key-cache-ohc}}
is based on that branch and contains:
* “singletons” for key-cache {{o.a.c.db.SerializationHeader}} instances (dynamically extended,
if required)
* “singletons” for {{IndexInfo.Serializer}} in {{o.a.c.db.Serializers}} (dynamically extended,
if required)
* “singletons” for {{BigVersion}} instances for {{ma}}, {{la}}, {{ka}}, {{jb}} - other
versions get temporary objects (some tests use older sstable versions)

h2. Further optimisations

There are some things that can be optimised in the future:
* Currently we need to serialise keyspace and cf names _and_ cfId. This is necessary since
cfID of secondary indexes is inherited from the base table. If all tables and all secondary
indexes have unique IDs, we can omit KS and CF name serialisation (and it’s weird {{cfName.contains(‘.’)}}
2i detection). Can be built with or after 2i API redesign.
* The full directory path is serialised. This appears to be less expensive than iterating
of the whole {{List}} of sstables and identifying an sstable by its generation.
* As [~benedict] suggested, we can switch to very tiny key-cache entries and also omit serialisation
of {{IndexInfo}}.


> Migrate key-cache to be fully off-heap
> --------------------------------------
>
>                 Key: CASSANDRA-9738
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Robert Stupp
>            Assignee: Robert Stupp
>             Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable
now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure off-heap counter
cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message