Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2CA9E200C34 for ; Mon, 27 Feb 2017 23:06:53 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 2B328160B5B; Mon, 27 Feb 2017 22:06:53 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7679E160B60 for ; Mon, 27 Feb 2017 23:06:52 +0100 (CET) Received: (qmail 6668 invoked by uid 500); 27 Feb 2017 22:06:51 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 6657 invoked by uid 99); 27 Feb 2017 22:06:51 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Feb 2017 22:06:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 117DF18E84A for ; Mon, 27 Feb 2017 22:06:51 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.547 X-Spam-Level: X-Spam-Status: No, score=-1.547 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-2.999, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id LSvin6ksfoG8 for ; Mon, 27 Feb 2017 22:06:50 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id E68DE5F58E for ; Mon, 27 Feb 2017 22:06:49 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 0F220E0A17 for ; Mon, 27 Feb 2017 22:06:46 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 6B93E24140 for ; Mon, 27 Feb 2017 22:06:45 +0000 (UTC) Date: Mon, 27 Feb 2017 22:06:45 +0000 (UTC) From: "Jeff Jirsa (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 27 Feb 2017 22:06:53 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886621#comment-15886621 ] Jeff Jirsa commented on CASSANDRA-13241: ---------------------------------------- {quote} Can the increased offheap requirements be expressed in a formula? {quote} [8 bytes per compression chunk|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/io/compress/CompressionMetadata.java#L172-L205]. 4x as many chunks = 4x as much offheap required to store them. AFAIK, it's a naive encoding. [~aweisberg] suggested in IRC that a more efficient encoding may make such a tradeoff much more tolerable. > Lower default chunk_length_in_kb from 64kb to 4kb > ------------------------------------------------- > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core > Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high chunk size may lead to massive overreads and may have a critical impact on overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and avg reads of 200MB/s. After lowering chunksize (of course aligned with read ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but if the model consists rather of small rows or small resultsets, the read overhead with 64kb chunk size is insanely high. This applies for example for (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)