cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Kjellman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions
Date Mon, 17 Oct 2016 19:51:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583250#comment-15583250
] 

Michael Kjellman commented on CASSANDRA-9754:
---------------------------------------------

In regards to your second point: I'm actually only am using the key cache in the current implementation
if a) it's a legacy index that hasn't been upgraded yet (to keep performance for indexed rows
the same during upgrades) b) if the row is "non" indexed, or < 64kb so just the starting
offset.

For Birch indexed rows they always come from the Birch impl on disk and don't get stored in
the key cache at all. Ideally I think it would be great if we could get rid of the key cache
all together! There was some chat about this in the ticket earlier...

There is the index summary which has an offset for keys as they are sampled during compaction
which let you skip to a given starting file offset inside the index for a key which reduces
the problem you're talking about. I don't think the performance of the small-to-medium sized
case should be any different with the Birch implementation than the current implementation
and I'm trying to test that with the workload going on for the test_keyspace.largeuuid1 table.
The issue with the Birch implementation vs the current though is going to be the size of the
index file on disk due to the segments being aligned on 4kb boundaries. I've talked a bunch
about this and thrown some ideas around with people and I think maybe the best case would
be to check if the previously added row was a non-indexed segment (so just a long for the
start of the partition in the index and no tree being built) and then don't align the file
to a boundary for those cases. The real issue is I don't know the length ahead of time so
I can't just encode the aligned segments at the end starting at some starting offset and encode
relative offsets iteratively during compaction. Any thoughts on this would be really appreciated
though...

> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>
>                 Key: CASSANDRA-9754
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>             Fix For: 4.x
>
>         Attachments: gc_collection_times_with_birch.png, gc_collection_times_without_birch.png,
gc_counts_with_birch.png, gc_counts_without_birch.png, perf_cluster_1_with_birch_read_latency_and_counts.png,
perf_cluster_1_with_birch_write_latency_and_counts.png, perf_cluster_2_with_birch_read_latency_and_counts.png,
perf_cluster_2_with_birch_write_latency_and_counts.png, perf_cluster_3_without_birch_read_latency_and_counts.png,
perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects are IndexInfo
and its ByteBuffers. This is specially bad in endpoints with large CQL partitions. If a CQL
partition is say 6,4GB, it will have 100K IndexInfo objects and 200K ByteBuffers. This will
create a lot of churn for GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message