Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 82063200B98 for ; Mon, 3 Oct 2016 19:27:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 80C52160ADC; Mon, 3 Oct 2016 17:27:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C7CD0160ACD for ; Mon, 3 Oct 2016 19:27:21 +0200 (CEST) Received: (qmail 36280 invoked by uid 500); 3 Oct 2016 17:27:21 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 36263 invoked by uid 99); 3 Oct 2016 17:27:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Oct 2016 17:27:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id CC2E52C2ABB for ; Mon, 3 Oct 2016 17:27:20 +0000 (UTC) Date: Mon, 3 Oct 2016 17:27:20 +0000 (UTC) From: "Branimir Lambov (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 03 Oct 2016 17:27:22 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15542919#comment-15542919 ] Branimir Lambov commented on CASSANDRA-9754: -------------------------------------------- bq. if we mmap a few times we'll still incur the very high and unpredictable costs from mmap The {{MmappedRegions}} usage is to map the regions at sstable load, i.e. effectively only once in the table's lifecycle, which should completely avoid any mmap costs at read time. bq. I'm wondering though if mmap'ing things even makes since Depends if we want to squeeze the last bit of performance or not. Memmapped data (assuming already mapped as above) that resides in the page cache has no cost whatsoever to be accessed, while reading it off RAF or a channel still needs a system call plus some copying. The difference is fest most on workloads that fit entirely in the page cache. If you don't feel like this is helpful, you can leave this out of the 2.1 version and rely on {{Rebufferer}} (or {{RandomAccessReader}}) to do memmapping or caching for you in trunk. > Make index info heap friendly for large CQL partitions > ------------------------------------------------------ > > Key: CASSANDRA-9754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9754 > Project: Cassandra > Issue Type: Improvement > Reporter: sankalp kohli > Assignee: Michael Kjellman > Priority: Minor > Fix For: 4.x > > Attachments: 9754_part1-v1.diff, 9754_part2-v1.diff > > > Looking at a heap dump of 2.0 cluster, I found that majority of the objects are IndexInfo and its ByteBuffers. This is specially bad in endpoints with large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for GC. Can this be improved by not creating so many objects? -- This message was sent by Atlassian JIRA (v6.3.4#6332)