Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0CD98774A for ; Mon, 15 Aug 2011 22:33:10 +0000 (UTC) Received: (qmail 82680 invoked by uid 500); 15 Aug 2011 22:33:07 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 82594 invoked by uid 500); 15 Aug 2011 22:33:06 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 82586 invoked by uid 99); 15 Aug 2011 22:33:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Aug 2011 22:33:06 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tholzer@wetafx.co.nz designates 110.232.144.26 as permitted sender) Received: from [110.232.144.26] (HELO meera.wetafx.co.nz) (110.232.144.26) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Aug 2011 22:33:00 +0000 Received: from localhost (localhost [127.0.0.1]) by meera.wetafx.co.nz (Postfix) with ESMTP id 0A2CE9EF0010; Tue, 16 Aug 2011 10:32:38 +1200 (NZST) X-Virus-Scanned: with amavisd-new by meera at meera.wetafx.co.nz Received: from meera.wetafx.co.nz ([127.0.0.1]) by localhost (meera.wetafx.co.nz [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pzOUdNFWSz0u; Tue, 16 Aug 2011 10:32:37 +1200 (NZST) Received: from jupiter036.localdomain (webmail.wetafx.co.nz [192.168.120.66]) by meera.wetafx.co.nz (Postfix) with ESMTP id B13239EF000A; Tue, 16 Aug 2011 10:32:37 +1200 (NZST) Received: from localhost (localhost [127.0.0.1]) by jupiter036.localdomain (Postfix) with ESMTP id 72B165BE979; Tue, 16 Aug 2011 10:32:37 +1200 (NZST) X-Virus-Scanned: amavisd-new at wetafx.co.nz Received: from jupiter036.localdomain ([127.0.0.1]) by localhost (smtp-digi.wetafx.co.nz [127.0.0.1]) (amavisd-new, port 10024) with LMTP id lnBBD1NgsHO2; Tue, 16 Aug 2011 10:32:37 +1200 (NZST) Received: from [192.168.49.114] (boarshead.wetafx.co.nz [192.168.49.114]) by jupiter036.localdomain (Postfix) with ESMTP id 532335BE967; Tue, 16 Aug 2011 10:32:37 +1200 (NZST) Message-ID: <4E499E85.9060208@wetafx.co.nz> Date: Tue, 16 Aug 2011 10:32:37 +1200 From: Teijo Holzer User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: user@cassandra.apache.org CC: Philippe Subject: Re: Scalability question References: <78F2E2FF-B24D-44BB-BC51-F07E5B8CA03B@thelastpickle.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi, we have come across this as well. We run continuously run rolling repairs followed by major compactions followed by a gc() (or node restart) to get rid of all these sstables files. Combined with aggressive ttls on most inserts, the cluster stays nice and lean. You don't want your working set to grow indefinitely. Cheers, T. On 16/08/11 08:08, Philippe wrote: > Forgot to mention that stopping & restarting the server brought the data > directory down to 283GB in less than 1 minute. > > Philippe > 2011/8/15 Philippe > > > It's another reason to avoid major / manual compactions which create a > single big SSTable. Minor compactions keep things in buckets which > means newer SSTable can be compacted needing to read the bigger older > tables. > > I've never run a major/manual compaction on this ring. > In my case running repair on a "big" keyspace results in SSTables piling > up. My problematic node just filled up 483GB (yes, GB) of SSTTables. Here > are the biggest > ls -laSrh > (...) > > -rw-r--r-- 1 cassandra cassandra 2.7G 2011-08-15 14:13 > PUBLIC_MONTHLY_20-g-4581-Data.db > > -rw-r--r-- 1 cassandra cassandra 2.7G 2011-08-15 14:52 > PUBLIC_MONTHLY_20-g-4641-Data.db > > -rw-r--r-- 1 cassandra cassandra 2.8G 2011-08-15 14:39 > PUBLIC_MONTHLY_20-tmp-g-4878-Data.db > > -rw-r--r-- 1 cassandra cassandra 2.9G 2011-08-15 15:00 > PUBLIC_MONTHLY_20-g-4656-Data.db > > -rw-r--r-- 1 cassandra cassandra 3.0G 2011-08-15 14:17 > PUBLIC_MONTHLY_20-g-4599-Data.db > > -rw-r--r-- 1 cassandra cassandra 3.0G 2011-08-15 15:11 > PUBLIC_MONTHLY_20-g-4675-Data.db > > -rw-r--r-- 3 cassandra cassandra 3.1G 2011-08-13 10:34 > PUBLIC_MONTHLY_18-g-3861-Data.db > > -rw-r--r-- 1 cassandra cassandra 3.2G 2011-08-15 14:41 > PUBLIC_MONTHLY_20-tmp-g-4884-Data.db > > -rw-r--r-- 1 cassandra cassandra 3.6G 2011-08-15 14:44 > PUBLIC_MONTHLY_20-tmp-g-4894-Data.db > > -rw-r--r-- 1 cassandra cassandra 3.8G 2011-08-15 14:56 > PUBLIC_MONTHLY_20-tmp-g-4934-Data.db > > -rw-r--r-- 1 cassandra cassandra 3.8G 2011-08-15 14:46 > PUBLIC_MONTHLY_20-tmp-g-4905-Data.db > > -rw-r--r-- 1 cassandra cassandra 4.0G 2011-08-15 14:57 > PUBLIC_MONTHLY_20-tmp-g-4935-Data.db > > -rw-r--r-- 3 cassandra cassandra 5.9G 2011-08-13 12:53 > PUBLIC_MONTHLY_19-g-4219-Data.db > > -rw-r--r-- 3 cassandra cassandra 6.0G 2011-08-13 13:57 > PUBLIC_MONTHLY_20-g-4538-Data.db > > -rw-r--r-- 3 cassandra cassandra 12G 2011-08-13 09:27 > PUBLIC_MONTHLY_20-g-4501-Data.db > > > On the other nodes the same directory is around 69GB. Why are there so > fewer large files there and so many big ones on the repairing node ? > -rw-r--r-- 1 cassandra cassandra 434M 2011-08-15 16:02 > PUBLIC_MONTHLY_17-g-3525-Data.db > -rw-r--r-- 1 cassandra cassandra 456M 2011-08-15 15:50 > PUBLIC_MONTHLY_19-g-4253-Data.db > -rw-r--r-- 1 cassandra cassandra 485M 2011-08-15 14:30 > PUBLIC_MONTHLY_20-g-5280-Data.db > -rw-r--r-- 1 cassandra cassandra 572M 2011-08-15 15:15 > PUBLIC_MONTHLY_18-g-3774-Data.db > -rw-r--r-- 2 cassandra cassandra 664M 2011-08-09 15:39 > PUBLIC_MONTHLY_20-g-4893-Index.db > -rw-r--r-- 2 cassandra cassandra 811M 2011-08-11 21:27 > PUBLIC_MONTHLY_16-g-2597-Data.db > -rw-r--r-- 2 cassandra cassandra 915M 2011-08-13 04:00 > PUBLIC_MONTHLY_18-g-3695-Data.db > -rw-r--r-- 1 cassandra cassandra 925M 2011-08-15 03:39 > PUBLIC_MONTHLY_17-g-3454-Data.db > -rw-r--r-- 1 cassandra cassandra 1.3G 2011-08-15 13:46 > PUBLIC_MONTHLY_19-g-4199-Data.db > -rw-r--r-- 2 cassandra cassandra 1.5G 2011-08-10 15:37 > PUBLIC_MONTHLY_17-g-3218-Data.db > -rw-r--r-- 1 cassandra cassandra 1.9G 2011-08-15 14:35 > PUBLIC_MONTHLY_20-g-5281-Data.db > -rw-r--r-- 2 cassandra cassandra 2.1G 2011-08-10 16:33 > PUBLIC_MONTHLY_19-g-3946-Data.db > -rw-r--r-- 2 cassandra cassandra 3.1G 2011-08-10 22:23 > PUBLIC_MONTHLY_18-g-3509-Data.db > -rw-r--r-- 2 cassandra cassandra 4.0G 2011-08-10 18:18 > PUBLIC_MONTHLY_20-g-5024-Data.db > -rw------- 2 cassandra cassandra 5.1G 2011-08-09 15:23 > PUBLIC_MONTHLY_19-g-3847-Data.db > -rw-r--r-- 2 cassandra cassandra 9.6G 2011-08-09 15:39 > PUBLIC_MONTHLY_20-g-4893-Data.db > > This whole compaction thing is getting me worried : how are sites in > production dealing with SSTables becoming larger and larger and thus taking > longer and longer to compact ? Adding nodes every couple of weeks ? > > Philippe > >