Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 21F01200B74 for ; Thu, 1 Sep 2016 08:54:43 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0DF96160AC9; Thu, 1 Sep 2016 06:54:33 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 55AF4160AAE for ; Thu, 1 Sep 2016 08:54:32 +0200 (CEST) Received: (qmail 57059 invoked by uid 500); 1 Sep 2016 06:54:21 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 57038 invoked by uid 99); 1 Sep 2016 06:54:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Sep 2016 06:54:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id B952E2C1B93 for ; Thu, 1 Sep 2016 06:54:20 +0000 (UTC) Date: Thu, 1 Sep 2016 06:54:20 +0000 (UTC) From: "Michael Kjellman (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 01 Sep 2016 06:54:43 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15454525#comment-15454525 ] Michael Kjellman commented on CASSANDRA-9754: --------------------------------------------- I've discovered a performance regression caused by the original logic in PageAlignedReader. I always knew the original design wasn't ideal, however, I felt that the additional code complexity wasn't worth the performance improvements. However, now that the code is stabilized and I've moved on to performance validation (and not just bugs and implementation) I found it was horribly inefficient. https://github.com/mkjellman/cassandra/commit/33d35272ae50803bac626ab60d5ecd3a36f5b283 I've updated the documentation in PageAlignedWriter to cover the new PageAligned file format. The new implementation allows lazy deserialization of segment metadata as required, and enables binary search across segments via the fixed length starting offsets. This means deserialization of the segments are no longer required ahead of time -- deserialization of the segment metadata only occurs when required to return a result. Initial benchmarking and profiling makes me a pretty happy guy. I think the new design is a massive improvement over the old one and looks pretty good so far. > Make index info heap friendly for large CQL partitions > ------------------------------------------------------ > > Key: CASSANDRA-9754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9754 > Project: Cassandra > Issue Type: Improvement > Reporter: sankalp kohli > Assignee: Michael Kjellman > Priority: Minor > Fix For: 4.x > > Attachments: 9754_part1-v1.diff, 9754_part2-v1.diff > > > Looking at a heap dump of 2.0 cluster, I found that majority of the objects are IndexInfo and its ByteBuffers. This is specially bad in endpoints with large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for GC. Can this be improved by not creating so many objects? -- This message was sent by Atlassian JIRA (v6.3.4#6332)