Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4337E200B74 for ; Thu, 1 Sep 2016 19:49:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 41CF1160ACA; Thu, 1 Sep 2016 17:49:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 88E28160AA8 for ; Thu, 1 Sep 2016 19:49:21 +0200 (CEST) Received: (qmail 64716 invoked by uid 500); 1 Sep 2016 17:49:20 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 64695 invoked by uid 99); 1 Sep 2016 17:49:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Sep 2016 17:49:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8EB4E2C014C for ; Thu, 1 Sep 2016 17:49:20 +0000 (UTC) Date: Thu, 1 Sep 2016 17:49:20 +0000 (UTC) From: "Wei Deng (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-12591) Re-evaluate the default 160MB sstable_size_in_mb choice in LCS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 01 Sep 2016 17:49:22 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-12591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Deng updated CASSANDRA-12591: --------------------------------- Description: There has been some effort from CASSANDRA-5727 in benchmarking and evaluating the best max_sstable_size used by LeveledCompactionStrategy, and the conclusion derived from that effort was to use 160MB as the most optimal size for both throughput (i.e. the time spent on compaction, the smaller the better) and the amount of bytes compacted (to avoid write amplification, the less the better). However, when I read more into that test report (the short [comment|https://issues.apache.org/jira/browse/CASSANDRA-5727?focusedCommentId=13722571&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13722571] describing the tests), I realized it was conducted on a hardware with the following configuration: "a single rackspace node with 2GB of ram." I'm not sure if this was an ok hardware configuration for production Cassandra deployment at that time (mid-2013), but it is definitely far lower from today's hardware standard now. Given that we now have compaction-stress which is able to generate SSTables based on user defined stress profile with user defined table schema and compaction parameters (compatible to cassandra-stress), it would be a useful effort to relook at this number using a more realistic hardware configuration and see if 160MB is still the optimal choice. It might also impact our perceived "practical" node density with LCS nodes if it turns out bigger max_sstable_size actually works better as it will allow less number of SSTables (and hence less level and less write amplification) per node with bigger density. was: There has been some effort from CASSANDRA-5727 in benchmarking and evaluating the best max_sstable_size used by LeveledCompactionStrategy, and the conclusion derived from that effort was to use 160MB as the most optimal size for both throughput (i.e. the time spent on compaction, the smaller the better) and the amount of bytes compacted (to avoid write amplification, the less the better). However, when I read more into that test report, I realized it was conducted on a hardware with the following configuration: "a single rackspace node with 2GB of ram." I'm not sure if this was an ok hardware configuration for production Cassandra deployment at that time (mid-2013), but it is definitely far lower from today's hardware standard now. Given that we now have compaction-stress which is able to generate SSTables based on user defined stress profile with user defined table schema and compaction parameters (compatible to cassandra-stress), it would be a useful effort to relook at this number using a more realistic hardware configuration and see if 160MB is still the optimal choice. It might also impact our perceived "practical" node density with LCS nodes if it turns out bigger max_sstable_size actually works better as it will allow less number of SSTables (and hence less level and less write amplification) per node with bigger density. > Re-evaluate the default 160MB sstable_size_in_mb choice in LCS > -------------------------------------------------------------- > > Key: CASSANDRA-12591 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12591 > Project: Cassandra > Issue Type: Improvement > Components: Compaction > Reporter: Wei Deng > Labels: lcs > > There has been some effort from CASSANDRA-5727 in benchmarking and evaluating the best max_sstable_size used by LeveledCompactionStrategy, and the conclusion derived from that effort was to use 160MB as the most optimal size for both throughput (i.e. the time spent on compaction, the smaller the better) and the amount of bytes compacted (to avoid write amplification, the less the better). > However, when I read more into that test report (the short [comment|https://issues.apache.org/jira/browse/CASSANDRA-5727?focusedCommentId=13722571&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13722571] describing the tests), I realized it was conducted on a hardware with the following configuration: "a single rackspace node with 2GB of ram." I'm not sure if this was an ok hardware configuration for production Cassandra deployment at that time (mid-2013), but it is definitely far lower from today's hardware standard now. > Given that we now have compaction-stress which is able to generate SSTables based on user defined stress profile with user defined table schema and compaction parameters (compatible to cassandra-stress), it would be a useful effort to relook at this number using a more realistic hardware configuration and see if 160MB is still the optimal choice. It might also impact our perceived "practical" node density with LCS nodes if it turns out bigger max_sstable_size actually works better as it will allow less number of SSTables (and hence less level and less write amplification) per node with bigger density. -- This message was sent by Atlassian JIRA (v6.3.4#6332)