cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions
Date Sat, 29 Aug 2015 01:40:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720893#comment-14720893
] 

Jonathan Ellis commented on CASSANDRA-9754:
-------------------------------------------

1. Learning time for us would be compaction
2. ISTM this was not core to the algorithm, but it's been a while since I read the details
3. We could store the offset in the ARF leaves, this was definitely not core
4, 5. Yes, this is a key point. Like our existing index, ARF is designed to be memory-resident.
 As partitions grow larger the ARF would degrade accuracy rather than spilling to disk (like
a B-tree) or getting obscenely large (like our existing index).

I would add,

6. Because of (5), ARF gives you BF-like behavior for range queries and can quickly optimize
away scans of sstables that don't contain the data in question.  (A very good fit for DTCS;
a smaller benefit for LCS.)

So, maybe we really want both.  ARF for the quick reject, (on-disk) B+ for "where do I start
scanning."

> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>
>                 Key: CASSANDRA-9754
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects are IndexInfo
and its ByteBuffers. This is specially bad in endpoints with large CQL partitions. If a CQL
partition is say 6,4GB, it will have 100K IndexInfo objects and 200K ByteBuffers. This will
create a lot of churn for GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message