Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E5D589F0A for ; Mon, 14 Nov 2011 02:45:28 +0000 (UTC) Received: (qmail 80869 invoked by uid 500); 14 Nov 2011 02:45:26 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 79646 invoked by uid 500); 14 Nov 2011 02:45:25 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 79552 invoked by uid 99); 14 Nov 2011 02:45:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Nov 2011 02:45:23 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of scode@scode.org designates 74.125.82.44 as permitted sender) Received: from [74.125.82.44] (HELO mail-ww0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Nov 2011 02:45:15 +0000 Received: by wwe5 with SMTP id 5so2900342wwe.25 for ; Sun, 13 Nov 2011 18:44:55 -0800 (PST) MIME-Version: 1.0 Received: by 10.180.99.225 with SMTP id et1mr22988525wib.14.1321238694545; Sun, 13 Nov 2011 18:44:54 -0800 (PST) Sender: scode@scode.org Received: by 10.180.24.201 with HTTP; Sun, 13 Nov 2011 18:44:54 -0800 (PST) X-Originating-IP: [67.169.39.43] In-Reply-To: <4EC0769D.6090402@bnl.gov> References: <1302618388.3794.34.camel@mierdi-laptop> <4DA47CCD.50509@panasiangroup.com> <1302630144.1732.2.camel@Avalon> <4DA491EE.6010500@panasiangroup.com> <4EB84C9F.8040208@bnl.gov> <4EBC7AB8.7030105@bnl.gov> <4EC066E7.9090707@bnl.gov> <4EC06E1B.9020905@bnl.gov> <4EC0769D.6090402@bnl.gov> Date: Sun, 13 Nov 2011 18:44:54 -0800 X-Google-Sender-Auth: xteioNOGzy6Sc93CXV9hlNG9nDY Message-ID: Subject: Re: Mass deletion -- slowing down From: Peter Schuller To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org > I'm not sure I entirely follow. By the oldest data, do you mean the > primary key corresponding to the limit of the time horizon? Unfortunately, > unique IDs and the timstamps do not correlate in the sense that > chronologically > "newer" entries might have a smaller sequential ID. That's because the > timestamp > corresponds to the last update that's stochastic in the sense that the jobs > can take > from seconds to days to complete. As I said I'm not sure I understood you > correctly. I was hoping there would be a "wave of deletions" that matched the order of the index (whatever is being read that is subject to the tombstones). If not, then my suggestion doesn't apply. Are you using cassandra secondary indexes or maintaining your own index btw? > Theoretically -- would compaction or cleanup help? Not directly. The only way to eliminate tombstones is for them to (1) expire according to gc grace seconds (again see http://wiki.apache.org/cassandra/DistributedDeletes) and then (2) for compaction to remove them. So while decreasing the gc grace period might mitigate it somewhat, I would advise against going that route since it doesn't solve the fundamental problem and it can be dangerous: gc grace has the usual implications on how often anti-entropy/repair must be run, and a cluster which is super-sensitive to a small grace time makes it a lot more volatile if e.g. you have repair problems and must temporarily increase gc grace. It seems better to figure out some way of structuring the data that the reads in question do not suffer from this problem. Note that reading individual columns should still scale well despite tombstones, as should slicing as long as the slices you're reading are reasonably dense (in terms of data vs. tombstone ratio) even if surrounding data is sparse. How many entries are you reading per query? I have been presuming it's the index read that is causing the timeout rather than the reading of the individual matching columns, since the maximum "per column" penalty when reading individual columns is finite, regardless of the sparsity of the data. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)