Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E961E9C9D for ; Thu, 16 May 2013 14:15:11 +0000 (UTC) Received: (qmail 93167 invoked by uid 500); 16 May 2013 14:15:09 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 93134 invoked by uid 500); 16 May 2013 14:15:09 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 93114 invoked by uid 99); 16 May 2013 14:15:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 May 2013 14:15:08 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [216.82.243.206] (HELO mail1.bemta8.messagelabs.com) (216.82.243.206) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 May 2013 14:15:01 +0000 Received: from [216.82.242.99:1905] by server-14.bemta-8.messagelabs.com id 0F/FB-19972-BB9E4915; Thu, 16 May 2013 14:14:19 +0000 X-Env-Sender: kwright@nanigans.com X-Msg-Ref: server-5.tower-131.messagelabs.com!1368713626!25512275!43 X-Originating-IP: [216.166.12.178] X-StarScan-Received: X-StarScan-Version: 6.8.6.1; banners=-,-,- X-VirusChecked: Checked Received: (qmail 9465 invoked from network); 16 May 2013 14:14:19 -0000 Received: from out001.collaborationhost.net (HELO out001.collaborationhost.net) (216.166.12.178) by server-5.tower-131.messagelabs.com with RC4-SHA encrypted SMTP; 16 May 2013 14:14:19 -0000 Received: from AUSP01VMBX28.collaborationhost.net ([192.168.20.73]) by AUSP01MHUB04.collaborationhost.net ([10.2.0.189]) with mapi; Thu, 16 May 2013 09:14:07 -0500 From: Keith Wright To: "user@cassandra.apache.org" Date: Thu, 16 May 2013 09:14:04 -0500 Subject: SSTable size versus read performance Thread-Topic: SSTable size versus read performance Thread-Index: Ac5SP6BQLHXjL7D/RxOUziRUODv1pQ== Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.3.120616 acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_CDBA61ECE018kwrightnaniganscom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_CDBA61ECE018kwrightnaniganscom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi all, I currently have 2 clusters, one running on 1.1.10 using CQL2 and one r= unning on 1.2.4 using CQL3 and Vnodes. The machines in the 1.2.4 cluster = are expected to have better IO performance as we are going from 1 SSD data = disk per node in the 1.1 cluster to 3 SSD data disks per node in the 1.2 cl= uster with higher end drives (commit logs are on their own disk shared with= the OS). I am doing some stress testing on the 1.2 cluster and have found= that although the reads / sec as seen from iostat are approximately the sa= me (3K / sec) in both clusters, the MB/s read in the new cluster is MUCH hi= gher (7 MB/s in 1.1 as compared to 30-50 MB/s in 1.2). As a result, I am s= eeing excessive iowait in the 1.2 cluster causing high average read times o= f 30 ms under the same load (1.1 cluster sees around 5 ms). They are both = using Leveled compaction but one thing I did change in the new cluster was = to increase the sstable size from the OOTB setting to 32 MB. Note that my = reads are by definition highly random as we are running memcached in front = for various reasons. Does cassandra need to read the entire SSTable when f= etching a row or only the relevant chunk (I have the OOTB chunk size and BF= settings)? I just decreased the sstable size to 5 MB and am waiting for c= ompactions to complete to see if that makes a difference. Thanks! Relevant table definition if helpful (note that I also changed to the LZ4 c= ompressor expecting better read performance and I decreased the crc change = again to minimize read latency): CREATE TABLE global_user ( user_id BIGINT, app_id INT, type TEXT, name TEXT, last TIMESTAMP, paid BOOLEAN, values map, sku_time map, extra_param map, PRIMARY KEY (user_id, app_id, type, name) ) with compression=3D{'crc_check_chance':0.1,'sstable_compression':'LZ4Comp= ressor'} and compaction=3D{'class':'LeveledCompactionStrategy'} and compaction_strategy_options =3D {'sstable_size_in_mb':5} and gc_grace_seconds =3D 86400; --_000_CDBA61ECE018kwrightnaniganscom_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
Hi all,

    I currently have 2 clusters, one running on 1.1.10 u= sing CQL2 and one running on 1.2.4 using CQL3 and Vnodes.   The machin= es in the 1.2.4 cluster are expected to have better IO performance as we ar= e going from 1 SSD data disk per node in the 1.1 cluster to 3 SSD data disk= s per node in the 1.2 cluster with higher end drives (commit logs are on th= eir own disk shared with the OS).  I am doing some stress testing on t= he 1.2 cluster and have found that although the reads / sec as seen from io= stat are approximately the same (3K / sec) in both clusters, the MB/s read = in the new cluster is MUCH higher (7 MB/s in 1.1 as compared to 30-50 MB/s = in 1.2).  As a result, I am seeing excessive iowait in the 1.2 cluster= causing high average read times of 30 ms under the same load (1.1 cluster = sees around 5 ms).  They are both using Leveled compaction but one thi= ng I did change in the new cluster was to increase the sstable size from th= e OOTB setting to 32 MB.  Note that my reads are by definition highly = random as we are running memcached in front for various reasons.  Does= cassandra need to read the entire SSTable when fetching a row or only the = relevant chunk (I have the OOTB chunk size and BF settings)?  I just d= ecreased the sstable size to 5 MB and am waiting for compactions to complet= e to see if that makes a difference.  

Thanks= !

Relevant table definition if helpful (note that = I also changed to the LZ4 compressor expecting better read performance and = I decreased the crc change again to minimize read latency):

<= /div>
C= REATE TABLE global_user (
user_id BIGINT,
app_id INT,
type TEXT,
name TEXT,
last TIMESTAMP,
paid BOOLEAN,
values map<TIMESTAMP,FLOAT>,
sku_time map<TEXT,TIMESTAMP>,
extra_param map<TEXT,TEXT= >, =
PRIMARY KEY = (user_id, app_id, type, name)
) with compression=3D{'crc_check_chance':= 0.1,'sstable_compression':'LZ4Compressor'} and 
compaction=3D{'class':'Level= edCompactionStrategy'} and 
compaction_strategy_options =3D {'sstable_size_i= n_mb':5} and 
gc_grace_seconds =3D 86400;
--_000_CDBA61ECE018kwrightnaniganscom_--