Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 45737 invoked from network); 14 Mar 2011 14:34:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Mar 2011 14:34:21 -0000 Received: (qmail 75455 invoked by uid 500); 14 Mar 2011 14:34:19 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 75433 invoked by uid 500); 14 Mar 2011 14:34:19 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 75425 invoked by uid 99); 14 Mar 2011 14:34:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Mar 2011 14:34:18 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sylvain@datastax.com designates 209.85.218.44 as permitted sender) Received: from [209.85.218.44] (HELO mail-yi0-f44.google.com) (209.85.218.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Mar 2011 14:34:12 +0000 Received: by yic13 with SMTP id 13so2692928yic.31 for ; Mon, 14 Mar 2011 07:33:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.53.74 with SMTP id f50mr3315521yhc.380.1300113231470; Mon, 14 Mar 2011 07:33:51 -0700 (PDT) Received: by 10.147.39.9 with HTTP; Mon, 14 Mar 2011 07:33:51 -0700 (PDT) X-Originating-IP: [88.183.33.171] In-Reply-To: <4D7D0885.4070300@hiramoto.org> References: <4D7D0885.4070300@hiramoto.org> Date: Mon, 14 Mar 2011 15:33:51 +0100 Message-ID: Subject: Re: reducing disk usage advice From: Sylvain Lebresne To: user@cassandra.apache.org Cc: Karl Hiramoto Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Sun, Mar 13, 2011 at 7:10 PM, Karl Hiramoto wrote: > > Hi, > > I'm looking for advice on reducing disk usage.=A0=A0 I've ran out of disk= space two days in a row while running a=A0 nightly scheduled nodetool repa= ir && nodetool compact=A0 cronjob. > > I have 6 nodes RF=3D3=A0 each with 300 GB drives at a hosting company.=A0= =A0 GCGraceSeconds =3D 260000 (3.1 days) > > Every column in the database has a TTL of 86400 (24 hours)=A0=A0 to handl= e deletion of stale data.=A0=A0 50% of the time the data is only written on= ce, read 0 or many times then expires. The other 50% of the time it's writt= en multiple times, resetting the TTL to 24 hours each time. As it turns out, the compaction algorithm=A0is pretty much the worst possible for this use case. Because we compact files that have a similar size, the older a column gets, the less often it is compacted. If you always set a fixed TTL for all columns, you would want to do some compaction of recent sstable, for the sake of not having too many sstables, but you also want to compact old sstable, that are guaranteed to just go away. And for those, it's actually fine to compact them alone (only for the sake of purging). But as compaction works, you will end up with big sstables of stuffs that are expired, and you may even not be able to compact simply because compaction "thinks" it doesn't have enough room. But I do think that your use case (having a CF where are columns have the same TTL and you only rely on it for deletion) is a very useful one, and we should handle it better. In particular, CASSANDRA-1610 could be an easy way to get this. CASSANDRA-1537 is probably also a partial but possibly sufficient solution. That's also probably easier than CASSANDRA-1610 and I'll try to give it a shot asap, that had been on my todo list way too long. > One question,=A0 since I use a TTL is it safe to set GCGraceSeconds=A0 to= 0?=A0=A0 I don't manually delete ever, I just rely on the TTL for deletion= , so are forgotten deletes an issue? The rule is this. Say you think that m is a reasonable value for GCGraceSeconds. That is, you make sure that you'll always put back up failing nodes and run repair within m seconds. Then, if you always use a TTL of n (in your case 24 hours), the actual GCGraceSeconds that you should set is m - n. So putting a GCGrace of 0 in you would would be roughly equivalent to set a GCGrace of 24h on a "normal" CF. That's probably a bit low. -- Sylvain > > > > cfstats: > =A0Read Count: 32052 > =A0=A0=A0=A0=A0=A0=A0 Read Latency: 3.1280378135529765 ms. > =A0=A0=A0=A0=A0=A0=A0 Write Count: 9704525 > =A0=A0=A0=A0=A0=A0=A0 Write Latency: 0.009527474760485443 ms. > =A0=A0=A0=A0=A0=A0=A0 Pending Tasks: 0 > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Column Family: Offer > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 SSTable count: 12 > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Space used (live): 59865089= 091 > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Space used (total): 7611157= 7830 > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Memtable Columns Count: 393= 55 > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Memtable Data Size: 1472631= 3 > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Memtable Switch Count: 414 > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Read Count: 32052 > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Read Latency: 3.128 ms. > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Write Count: 9704525 > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Write Latency: 0.010 ms. > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Pending Tasks: 0 > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Key cache capacity: 1000 > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Key cache size: 1000 > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Key cache hit rate: 2.48059= 31214280473E-4 > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Row cache: disabled > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Compacted row minimum size:= 36 > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Compacted row maximum size:= 1597 > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Compacted row mean size: 13= 19 >