Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E936775F8 for ; Mon, 14 Nov 2011 00:22:43 +0000 (UTC) Received: (qmail 94883 invoked by uid 500); 14 Nov 2011 00:22:41 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 94860 invoked by uid 500); 14 Nov 2011 00:22:41 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 94852 invoked by uid 99); 14 Nov 2011 00:22:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Nov 2011 00:22:41 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of scode@scode.org designates 74.125.82.44 as permitted sender) Received: from [74.125.82.44] (HELO mail-ww0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Nov 2011 00:22:34 +0000 Received: by wwe5 with SMTP id 5so2834144wwe.25 for ; Sun, 13 Nov 2011 16:22:13 -0800 (PST) MIME-Version: 1.0 Received: by 10.180.7.97 with SMTP id i1mr22832616wia.23.1321230133162; Sun, 13 Nov 2011 16:22:13 -0800 (PST) Sender: scode@scode.org Received: by 10.180.24.201 with HTTP; Sun, 13 Nov 2011 16:22:13 -0800 (PST) X-Originating-IP: [67.169.39.43] In-Reply-To: <4EBC7AB8.7030105@bnl.gov> References: <1302618388.3794.34.camel@mierdi-laptop> <4DA47CCD.50509@panasiangroup.com> <1302630144.1732.2.camel@Avalon> <4DA491EE.6010500@panasiangroup.com> <4EB84C9F.8040208@bnl.gov> <4EBC7AB8.7030105@bnl.gov> Date: Sun, 13 Nov 2011 16:22:13 -0800 X-Google-Sender-Auth: 8xhwHnU-KaZVAqelZ8qu12Jr3oU Message-ID: Subject: Re: Mass deletion -- slowing down From: Peter Schuller To: user@cassandra.apache.org, potekhin@bnl.gov Content-Type: text/plain; charset=UTF-8 Deletions in Cassandra imply the use of tombstones (see http://wiki.apache.org/cassandra/DistributedDeletes) and under some circumstances reads can turn O(n) with respect to the amount of columns deleted, depending. It sounds like this is what you're seeing. For example, suppose you're inserting a range of columns into a row, deleting it, and inserting another non-overlapping subsequent range. Repeat that a bunch of times. In terms of what's stored in Cassandra for the row you now have: tomb tomb tomb tomb .... actual data If you then do something like a slice on that row with the end-points being such that they include all the tombstones, Cassandra essentially has to read through and process all those tombstones (for the PostgreSQL aware: this is similar to the effect you can get if implementing e.g. a FIFO queue, where MIN(pos) turns O(n) with respect to the number of deleted entries until the last vacuum - improved in modern versions)). -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)