Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7B99BECF4 for ; Sun, 6 Jan 2013 22:31:42 +0000 (UTC) Received: (qmail 48830 invoked by uid 500); 6 Jan 2013 22:31:39 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 48710 invoked by uid 500); 6 Jan 2013 22:31:39 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 48700 invoked by uid 99); 6 Jan 2013 22:31:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Jan 2013 22:31:39 +0000 X-ASF-Spam-Status: No, hits=0.2 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.136.218.225] (HELO nm8-vm2.bullet.mail.gq1.yahoo.com) (98.136.218.225) by apache.org (qpsmtpd/0.29) with SMTP; Sun, 06 Jan 2013 22:31:32 +0000 Received: from [98.137.12.190] by nm8.bullet.mail.gq1.yahoo.com with NNFMP; 06 Jan 2013 22:31:11 -0000 Received: from [208.71.42.208] by tm11.bullet.mail.gq1.yahoo.com with NNFMP; 06 Jan 2013 22:31:11 -0000 Received: from [127.0.0.1] by smtp219.mail.gq1.yahoo.com with NNFMP; 06 Jan 2013 22:31:11 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1357511471; bh=IssZmxwPI/iP9CwXNimYea8MzAnL1R1WdncR8LHvzyk=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Received:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=BqmuVLPPbnrhzpg29AnAnY1udbsm0qMmDYWUpeaPqp7gm182fDFVSeYVw2SGk4gTLeEDOIHoXIRo4RevZvjigiNpbwWi3lQYhJsxQfRHaR+j+tCxC1URzvtpi3A5iJZbMsexAO/LPrNTLZmzHp5fjXnMKgMbgR4k8yE4nQOcLIM= X-Yahoo-Newman-Id: 494054.77386.bm@smtp219.mail.gq1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: OSqTmbMVM1kBbW4GAjYiFf9GqwQcCkHOfT7ns7I_4z5aFt7 2LAmZOJnADQGdh3bPRsCLvEG6XbWe6wTGq1eyUiFaGxV4QC8RzLEOrkqSzND KxLfH7eTzWJqDqmENM8NbolcY25o6UH3BVsG5lp4giKbpDZgMiYi89vqHbUE Z6lf0JWR0qvZK0Xf0dBTXDfcXFpKRBJyaNQd0nJLLPE.Rry0ktFJZ8PoYrD7 9C3UGgnmBc1sfCOo_mMQ35HrGW8fg1NKy.Cyz95kgDh0WEM9M2LBALYrszwf AqBgfnk8.P70zSLP8yvMy.NjNkKKm._QMUx60bD3D.07cN8GUUds36mdr5U2 RH9dBJjL4kGopUbfdRkFPcwUuq6n76JL9yJzi0sACQnoYpzXtRw9EMNBPMY6 Np7RIOhrXifk0sOcjmrx2w46cyfZ8ngN0l3ufyErNNlJ0plw6Xt_nGK2v8Y. V3Dl0Fnry26pyXJHE8yWZAh8BtMz5buQZE9Z8BV8RJw8YTo.sZU8K5tP6Sq0 38qqD7aC1RTBc5Svb2wJ5_B_V_yroWUsveoQT2f7uiHboww-- X-Yahoo-SMTP: t0UN_U2swBCFgwLIRu70LU92TrvpdQ-- Received: from [192.168.1.5] (mtheroux2@76.118.248.45 with plain) by smtp219.mail.gq1.yahoo.com with SMTP; 06 Jan 2013 14:31:11 -0800 PST Message-ID: <50E9FB48.5090301@yahoo.com> Date: Sun, 06 Jan 2013 17:31:36 -0500 From: Mike User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: user@cassandra.apache.org CC: aaron morton Subject: Re: Column Family migration/tombstones References: <50DF49CF.2060800@yahoo.com> <50E86688.9070106@yahoo.com> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Thanks Aaron, I appreciate it. It is my understanding, major compactions are not recommended because it will essentially create one massive SSTable that will not compact with any new SSTables for some time. I can see how this might be a performance concern in the general case, because any read operation would always require multiple disk reads across multiple SSTables. In addition, information in the new table will not be purged due to subsequent tombstones until that table can be compacted. This might then require regular major compactions to be able to clear that data. Are there other performance considerations that I need to keep in mind? However, this might not be as much of an issue in our usecase. It just so happens, the data in this column family is changed very infrequently, except for deletes (as of recently, and will now occur over time). In these case, I don't believe having data spread across the SSTables will be an issue, as either the data will have a tombstone (which causes cassandra to stop looking at other SSTables), or that data will be in one SSTable. So I do not believe I/O will end up being an issue here. What may be an issue is cleaning out old data in the SSTable that will exist after a major compaction. However, this might not require major compactions to happen nearly as frequently as I've seen recommended (once every gc_grace period), or at all. With the new design, data will be deleted from this table after a number of days. Deletes again the remaining data after a major compaction might not get processed until the next major compaction, but any deletes against new data should be deleted normally through minor compactions. In addition, the remaining data after we are complete the migration should be fairly small (about 500,000 skinny rows per node, including replicas). Any other thoughts on this? -Mike On 1/6/2013 3:49 PM, aaron morton wrote: >> When these rows are deleted, tombstones will be created and stored in more recent sstables. Upon compaction of sstables, and after gc_grace_period, I presume cassandra will have removed all traces of that row from disk. > Yes. > When using Size Tiered compaction (the default) tombstones are purged when all fragments of a row are included in a compaction. So if you have rows which are written to for A Very Long Time(�) it can take a while for everything to get purged. > > In the normal case though it's not a concern. > >> However, after deleting such a large amount of information, there is no guarantee that Cassandra will compact these two tables together, causing the data to be deleted (right?). Therefore, even after gc_grace_period, a large amount of space may still be used. > In the normal case this is not really an issue. > > In your case things sound a little non normal. If you will have only a few hundred MB's, or a few GB's, of data level in the CF I would consider running a major compaction on it. > > Major compaction will work on all SSTables and create one big SSTable, this will ensure all deleted data is deleted. We normally caution agains this as the one new file is often very big and will not get compacted for a while. However if you are deleting lots-o-data it may work. (There is also an anti compaction script around that may be of use.) > > Another alternative is to compact some of the older sstables with newer ones via User Defined Compaction with JMX. > > >> Is there a way, other than a major compaction, to clean up all this old data? I assume a nodetool scrub will cleanup old tombstones only if that row is not in another sstable? > I don't think scrub (or upgradesstables) remove tombstones. > >> Do tombstones take up bloomfilter space after gc_grace_period? > Any row, regardless of the liveness of the columns, takes up bloom filter space (in -Filter.db). > Once the row is removed it will no longer take up space. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 6/01/2013, at 6:44 AM, Mike wrote: > >> A couple more questions. >> >> When these rows are deleted, tombstones will be created and stored in more recent sstables. Upon compaction of sstables, and after gc_grace_period, I presume cassandra will have removed all traces of that row from disk. >> >> However, after deleting such a large amount of information, there is no guarantee that Cassandra will compact these two tables together, causing the data to be deleted (right?). Therefore, even after gc_grace_period, a large amount of space may still be used. >> >> Is there a way, other than a major compaction, to clean up all this old data? I assume a nodetool scrub will cleanup old tombstones only if that row is not in another sstable? >> >> Do tombstones take up bloomfilter space after gc_grace_period? >> >> -Mike >> >> On 1/2/2013 6:41 PM, aaron morton wrote: >>>> 1) As one can imagine, the index and bloom filter for this column family is large. Am I correct to assume that bloom filter and index space will not be reduced until after gc_grace_period? >>> Yes. >>> >>>> 2) If I would manually run repair across a cluster, is there a process I can use to safely remove these tombstones before gc_grace period to free this memory sooner? >>> There is nothing to specifically purge tombstones. >>> >>> You can temporarily reduce the gc_grace_seconds and then trigger compaction. Either by reducing the min_compaction_threshold to 2 and doing a flush. Or by kicking of a user defined compaction using the JMX interface. >>> >>>> 3) Any words of warning when undergoing this? >>> Make sure you have a good breakfast. >>> (It's more general advice than Cassandra specific.) >>> >>> >>> Cheers >>> >>> ----------------- >>> Aaron Morton >>> Freelance Cassandra Developer >>> New Zealand >>> >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 30/12/2012, at 8:51 AM, Mike wrote: >>> >>>> Hello, >>>> >>>> We are undergoing a change to our internal datamodel that will result in the eventual deletion of over a hundred million rows from a Cassandra column family. From what I understand, this will result in the generation of tombstones, which will be cleaned up during compaction, after gc_grace_period time (default: 10 days). >>>> >>>> A couple of questions: >>>> >>>> 1) As one can imagine, the index and bloom filter for this column family is large. Am I correct to assume that bloom filter and index space will not be reduced until after gc_grace_period? >>>> >>>> 2) If I would manually run repair across a cluster, is there a process I can use to safely remove these tombstones before gc_grace period to free this memory sooner? >>>> >>>> 3) Any words of warning when undergoing this? >>>> >>>> We are running Cassandra 1.1.2 on a 6 node cluster and a Replication Factor of 3. We use LOCAL_QUORM consistency for all operations. >>>> >>>> Thanks! >>>> -Mike