cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Kjellman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions
Date Thu, 01 Sep 2016 06:54:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15454525#comment-15454525
] 

Michael Kjellman commented on CASSANDRA-9754:
---------------------------------------------

I've discovered a performance regression caused by the original logic in PageAlignedReader.
I always knew the original design wasn't ideal, however, I felt that the additional code complexity
wasn't worth the performance improvements. However, now that the code is stabilized and I've
moved on to performance validation (and not just bugs and implementation) I found it was horribly
inefficient.

https://github.com/mkjellman/cassandra/commit/33d35272ae50803bac626ab60d5ecd3a36f5b283

I've updated the documentation in PageAlignedWriter to cover the new PageAligned file format.
The new implementation allows lazy deserialization of segment metadata as required, and enables
binary search across segments via the fixed length starting offsets. This means deserialization
of the segments are no longer required ahead of time -- deserialization of the segment metadata
only occurs when required to return a result.

Initial benchmarking and profiling makes me a pretty happy guy. I think the new design is
a massive improvement over the old one and looks pretty good so far.

> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>
>                 Key: CASSANDRA-9754
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>             Fix For: 4.x
>
>         Attachments: 9754_part1-v1.diff, 9754_part2-v1.diff
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects are IndexInfo
and its ByteBuffers. This is specially bad in endpoints with large CQL partitions. If a CQL
partition is say 6,4GB, it will have 100K IndexInfo objects and 200K ByteBuffers. This will
create a lot of churn for GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message