cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions
Date Mon, 03 Oct 2016 17:27:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15542919#comment-15542919
] 

Branimir Lambov commented on CASSANDRA-9754:
--------------------------------------------

bq. if we mmap a few times we'll still incur the very high and unpredictable costs from mmap

The {{MmappedRegions}} usage is to map the regions at sstable load, i.e. effectively only
once in the table's lifecycle, which should completely avoid any mmap costs at read time.

bq. I'm wondering though if mmap'ing things even makes since

Depends if we want to squeeze the last bit of performance or not. Memmapped data (assuming
already mapped as above) that resides in the page cache has no cost whatsoever to be accessed,
while reading it off RAF or a channel still needs a system call plus some copying. The difference
is fest most on workloads that fit entirely in the page cache.

If you don't feel like this is helpful, you can leave this out of the 2.1 version and rely
on {{Rebufferer}} (or {{RandomAccessReader}}) to do memmapping or caching for you in trunk.

> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>
>                 Key: CASSANDRA-9754
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>             Fix For: 4.x
>
>         Attachments: 9754_part1-v1.diff, 9754_part2-v1.diff
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects are IndexInfo
and its ByteBuffers. This is specially bad in endpoints with large CQL partitions. If a CQL
partition is say 6,4GB, it will have 100K IndexInfo objects and 200K ByteBuffers. This will
create a lot of churn for GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message