Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4B331100C2 for ; Wed, 18 Sep 2013 19:16:53 +0000 (UTC) Received: (qmail 34801 invoked by uid 500); 18 Sep 2013 19:16:49 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 34783 invoked by uid 500); 18 Sep 2013 19:16:49 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 34547 invoked by uid 99); 18 Sep 2013 19:16:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Sep 2013 19:16:47 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [207.46.163.157] (HELO na01-bl2-obe.outbound.protection.outlook.com) (207.46.163.157) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Sep 2013 19:16:39 +0000 Received: from BLUPR07CA032.namprd07.prod.outlook.com (10.141.20.47) by BLUPR07MB002.namprd07.prod.outlook.com (10.255.209.36) with Microsoft SMTP Server (TLS) id 15.0.775.9; Wed, 18 Sep 2013 19:15:58 +0000 Received: from BN1AFFO11FD034.protection.gbl (2a01:111:f400:7c10::119) by BLUPR07CA032.outlook.office365.com (2a01:111:e400:855::47) with Microsoft SMTP Server (TLS) id 15.0.775.9 via Frontend Transport; Wed, 18 Sep 2013 19:15:58 +0000 Received: from XEDGEA.nrel.gov (192.174.58.134) by BN1AFFO11FD034.mail.protection.outlook.com (10.58.52.158) with Microsoft SMTP Server (TLS) id 15.0.775.5 via Frontend Transport; Wed, 18 Sep 2013 19:15:57 +0000 Received: from XHUBA.nrel.gov (10.20.4.58) by XEDGEA.nrel.gov (192.174.58.134) with Microsoft SMTP Server (TLS) id 8.3.298.1; Wed, 18 Sep 2013 13:15:43 -0600 Received: from MAILBOX2.nrel.gov ([fe80::48b0:b121:8465:5e5]) by XHUBA.nrel.gov ([::1]) with mapi; Wed, 18 Sep 2013 13:15:56 -0600 From: "Hiller, Dean" To: "user@cassandra.apache.org" Date: Wed, 18 Sep 2013 13:15:55 -0600 Subject: Re: What is the ideal value for sstable_size_in_mb when using LeveledCompactionStrategy ? Thread-Topic: What is the ideal value for sstable_size_in_mb when using LeveledCompactionStrategy ? Thread-Index: Ac60o3/PhN7L8AlWS/yk61CSbR1DPw== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.3.7.130812 acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Forefront-Antispam-Report: CIP:192.174.58.134;CTRY:US;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(377454003)(189002)(199002)(53806001)(54316002)(31966008)(74706001)(6806004)(36756003)(83322001)(19580405001)(77982001)(19580395003)(59766001)(56776001)(74366001)(74662001)(77096001)(56816003)(54356001)(44976005)(79102001)(76786001)(46406003)(53416003)(76796001)(47446002)(49866001)(76176001)(80022001)(63696002)(74502001)(20776003)(65816001)(47976001)(47776003)(50986001)(47736001)(4396001)(81342001)(83072001)(74876001)(81686001)(69226001)(81816001)(83506001)(50466002)(76482001)(51856001)(81542001)(16796002)(80976001)(46102001)(23726002)(24704002);DIR:OUT;SFP:;SCL:1;SRVR:BLUPR07MB002;H:XEDGEA.nrel.gov;CLIP:192.174.58.134;FPR:;RD:xedgea.nrel.gov;MX:1;A:1;LANG:en; X-Forefront-PRVS: 09730BD177 X-OriginatorOrg: nrel.gov X-Virus-Checked: Checked by ClamAV on apache.org 1. Always in cassandra up your file descriptor limits on linux and even i= n 0.7 that was the recommendation so cassandra could open tons of files 2. We use 50M for our LCS with no performance issues. We had it 10M on o= ur previous with no issues but a huge amount of files of course with our 30= 0T per node. Dean From: Jayadev Jayaraman > Reply-To: "user@cassandra.apache.org" > Date: Wednesday, September 18, 2013 1:02 PM To: "user@cassandra.apache.org" > Subject: What is the ideal value for sstable_size_in_mb when using LeveledC= ompactionStrategy ? We have set up a 24 node (m1.xlarge nodes, 1.7 TB per node) cassandra clust= er on Amazon EC2 : version=3D1.2.9 replication factor =3D 2 snitch=3DEC2Snitch placement_strategy=3DNetworkTopologyStrategy (with 12 nodes each in 2 avail= ability zones) Background on our use-case : We plan on using hadoop with sstableloader to load 10GB+ of analytics data = per day ( 100 million+ row keys, 5 or so columns per day on average.) . We = have chosen LeveledCompactionStrategy in the hope that it constrains the nu= mber of SSTables that are read in order to retrieve a sliced-predicate for = a row. We don't want too many file-sockets ( > 1000) open to SSTables by th= e Cassandra JVM as this has caused us network / unreachability issues befor= e. We faced this when we were on cassandra 0.8.9 and we were using SizeTier= edCompactionStrategy and in order to mitigate this, we ran minor compaction= daily and major compaction semi-regularly to ensure as few SSTable files a= s possible on disk. If we use LeveledCompactionStrategy with a small value for sstable_size_in_= mb ( default =3D 5 MB ) , wouldn't that result in a very large number of SS= Table files on disk ? How does that affect the number of file-sockets open = (reading the docs, I get the impression that the number of SSTable seeks pe= r query is reduced by a large margin) ? But if we use a larger value for ss= table_size_in_mb, say around 200 MB, there will be 800 MB of small uncompac= ted SSTables on disk per column-family to which there will inevitably be fi= le-sockets open. All in all, can someone help us figure out what we should set the value of = sstable_size_in_mb to ? I figure it's not a very good idea to set it to a l= arger value but I don't know how things perform if we set it to a small val= ue. Do we have to run major compaction regularly in this case too ? Thanks Jayadev