Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Mon, 11 Apr 2016 14:15:25 +0000 (UTC)
From: "Jack Krupansky (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12843338.1436309454000.192846.1460384125813@Atlassian.JIRA>
In-Reply-To: <JIRA.12843338.1436309454000@Atlassian.JIRA>
References: <JIRA.12843338.1436309454000@Atlassian.JIRA>
 <JIRA.12843338.1436309454727@arcas>
Subject: [jira] [Commented] (CASSANDRA-9754) Make index info heap friendly
 for large CQL partitions
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235141#comment-15235141 ] 

Jack Krupansky commented on CASSANDRA-9754:
-------------------------------------------

Any idea how a new wide partition will perform relative to the same amount of data and same number of clustering rows divided into bucketed partitions? For example, a single 1 GB wide partition vs. ten 100 MB partitions (same partition key plus a 0-9 bucket number) vs. a hundred 10 MB partitions (0-99 bucket number), for two access patterns: 1) random access a row or short slice, and 2) a full bulk read of the 1 GB of data, one moderate slice at a time.

Or maybe the question is equivalent to asking what the cost is to access the last row of the 1 GB partition vs. the last row of the tenth or hundredth bucket of the bucketed equivalent.

No precision required. Just inquiring whether we can get rid of bucketing as a preferred data modeling strategy, at least for the common use cases where the sum of the buckets is roughly 2 GB or less..

The bucketing approach does have the side effect of distributing the buckets around the cluster, which could be a good thing, or maybe not.

> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>
>                 Key: CASSANDRA-9754
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects are IndexInfo and its ByteBuffers. This is specially bad in endpoints with large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for GC. Can this be improved by not creating so many objects?


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)