Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C1335D2CD for ; Fri, 9 Nov 2012 14:31:44 +0000 (UTC) Received: (qmail 34620 invoked by uid 500); 9 Nov 2012 14:31:42 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 34533 invoked by uid 500); 9 Nov 2012 14:31:41 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 34041 invoked by uid 99); 9 Nov 2012 14:31:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Nov 2012 14:31:41 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mina.naguib@bloomdigital.com designates 209.85.216.51 as permitted sender) Received: from [209.85.216.51] (HELO mail-qa0-f51.google.com) (209.85.216.51) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Nov 2012 14:31:33 +0000 Received: by mail-qa0-f51.google.com with SMTP id t11so425642qaa.10 for ; Fri, 09 Nov 2012 06:31:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=9O/r7AD0eyetFC9yTeFsCSl8elyfx5COJpAZhhhZsDU=; b=mUhwNP2r5RT68zC/iXJYk8F/foMk2TieBsFWHCmkKfqcruXAJmsaMfr1fN0yr34Sdt gaF07Eer9BG/WoTNxB1Qh+nv3JMkR4ofhxDuB60lE9Q67uS24i9ibt1ERwK5OwuDLGWk +2A5G5MnhkZmT+TGlReuo7x77hkZzefn6ZT9FDCrcnwiFwrLXWIOofXj+zbs0VJkhYfX dlzIZF+bV/4yremJy03gkcoSgYNTdVQVmSH+a9MoRu4b/806Dl3ZlN42TODHMDYGdzfW pdpQIhLBPILW9URzN2orUMIQ/JuC0CY3BNVcICc6bXSVKyc21ThjSSMedu4qeZwSwIbm cvzw== Received: by 10.49.103.162 with SMTP id fx2mr20302423qeb.1.1352471471913; Fri, 09 Nov 2012 06:31:11 -0800 (PST) Received: from [192.168.5.105] (modemcable092.25-161-184.mc.videotron.ca. [184.161.25.92]) by mx.google.com with ESMTPS id fl1sm8516155qab.14.2012.11.09.06.31.10 (version=SSLv3 cipher=OTHER); Fri, 09 Nov 2012 06:31:11 -0800 (PST) Sender: Mina Naguib Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: leveled compaction and tombstoned data From: Mina Naguib In-Reply-To: Date: Fri, 9 Nov 2012 09:31:09 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1499) X-Gm-Message-State: ALoCoQmmh6JLNSi0mS5dabilNaCYjvfxreIXqyn0KY+ry4ArmezKlWOqjzjM8Ka2OYdh7jIvnJpR X-Virus-Checked: Checked by ClamAV on apache.org On 2012-11-08, at 1:12 PM, B. Todd Burruss wrote: > we are having the problem where we have huge SSTABLEs with tombstoned = data in them that is not being compacted soon enough (because size = tiered compaction requires, by default, 4 like sized SSTABLEs). this is = using more disk space than we anticipated. >=20 > we are very write heavy compared to reads, and we delete the data = after N number of days (depends on the column family, but N is around 7 = days) >=20 > my question is would leveled compaction help to get rid of the = tombstoned data faster than size tiered, and therefore reduce the disk = space usage =46rom my experience, levelled compaction makes space reclamation after = deletes even less predictable than sized-tier. The reason is that deletes, like all mutations, are just recorded into = sstables. They enter level0, and get slowly, over time, promoted = upwards to levelN. Depending on your *total* mutation volume VS your data set size, this = may be quite a slow process. This is made even worse if the size of the = data you're deleting (say, an entire row worth several hundred = kilobytes) is to-be-deleted by a small row-level tombstone. If the row = is sitting in level 4, the tombstone won't impact it until enough data = has pushed over all existing data in level3, level2, level1, level0 Finally, to guard against the tombstone missing any data, the tombstone = itself is not candidate for removal (I believe even after gc_grace has = passed) unless it's reached the highest populated level in levelled = compaction. This means if you have 4 levels and issue a ton of deletes = (even deletes that will never impact existing data), these tombstones = are deadweight that cannot be purged until they hit level4. For a write-heavy workload, I recommend you stick with sized-tier. You = have several options at your disposal (compaction min/max thresholds, = gc_grace) to move things along. If that doesn't help, I've heard of = some fairly reputable people doing some fairly blasphemous things (major = compactions every night).