cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Kjellman (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions
Date Fri, 11 Mar 2016 09:43:06 GMT


Michael Kjellman commented on CASSANDRA-9754:

I have the new FileSegment friendly implementation working for the following conditions:

1) straight search for key -> get value
2) iterate efficiently both forwards and reversed thru all elements in the tree
3) binary search for a given key and then iterate thru all remaining keys from the found offset
4) overflow page for handling variable length tree elements that exceed the max size for a
given individual page (up to 2GB)

I also have successfully ran some new unit tests I wrote that now do 5000 consecutive iterations
with randomly generated data (to "fuzz" the tree for edge conditions) for building and validating
trees that contain between 300,000-500,000 elements. I've also spent a good amount of time
writing some pretty reasonable documentation of the binary format itself.

Tomorrow, I'm planning on testing a 4.5GB individual tree against the new implementation and
doing some profiling to see the exact memory impact now that it's basically completed on both
the serialization and deserialization paths. Will update with those findings tomorrow!

> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>                 Key: CASSANDRA-9754
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects are IndexInfo
and its ByteBuffers. This is specially bad in endpoints with large CQL partitions. If a CQL
partition is say 6,4GB, it will have 100K IndexInfo objects and 200K ByteBuffers. This will
create a lot of churn for GC. Can this be improved by not creating so many objects?

This message was sent by Atlassian JIRA

View raw message