cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavel Yaskevich (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions
Date Mon, 17 Oct 2016 19:35:59 GMT


Pavel Yaskevich commented on CASSANDRA-9754:

[~mkjellman] This looks great! Can you please post information regarding SSTables sizes and
their estimated key counts as well? AFAIR there exists another problem related to how indexes
are currently stored - if key is not in the key cache there is no way to jump to it directly
in the index file, index reader has to scan through index segment to find requested key, so
I'm wondering what happens in the situation when there are many keys which are small-to-medium
sized e.g. 64-128 MB in each given SSTable (let's say SSTable size is set to 1G or 2G) and
stress readers are trying to read random keys, what would be the difference between current
index read performance vs. index + birch tree?...

> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>                 Key: CASSANDRA-9754
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>             Fix For: 4.x
>         Attachments: gc_collection_times_with_birch.png, gc_collection_times_without_birch.png,
gc_counts_with_birch.png, gc_counts_without_birch.png, perf_cluster_1_with_birch_read_latency_and_counts.png,
perf_cluster_1_with_birch_write_latency_and_counts.png, perf_cluster_2_with_birch_read_latency_and_counts.png,
perf_cluster_2_with_birch_write_latency_and_counts.png, perf_cluster_3_without_birch_read_latency_and_counts.png,
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects are IndexInfo
and its ByteBuffers. This is specially bad in endpoints with large CQL partitions. If a CQL
partition is say 6,4GB, it will have 100K IndexInfo objects and 200K ByteBuffers. This will
create a lot of churn for GC. Can this be improved by not creating so many objects?

This message was sent by Atlassian JIRA

View raw message