Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1EE7D96A0 for ; Fri, 11 Nov 2011 01:31:04 +0000 (UTC) Received: (qmail 29375 invoked by uid 500); 11 Nov 2011 01:31:02 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 29249 invoked by uid 500); 11 Nov 2011 01:31:02 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 29241 invoked by uid 99); 11 Nov 2011 01:31:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Nov 2011 01:31:01 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of potekhin@bnl.gov designates 130.199.3.132 as permitted sender) Received: from [130.199.3.132] (HELO smtpgw.bnl.gov) (130.199.3.132) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Nov 2011 01:30:54 +0000 X-BNL-policy-q: X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApYUALB5vE6Cx1r6/2dsb2JhbABEh1aTIZIvAQU4QBEsFg8JAwIBAgFFEwgBARXBAol+BIgPniw X-IronPort-AV: E=Sophos;i="4.69,491,1315195200"; d="scan'208";a="152245321" Received: from dh10.s90.bnl.gov (HELO [130.199.90.250]) ([130.199.90.250]) by smtpgw.sec.bnl.local with ESMTP; 10 Nov 2011 20:30:32 -0500 Message-ID: <4EBC7AB8.7030105@bnl.gov> Date: Thu, 10 Nov 2011 20:30:32 -0500 From: Maxim Potekhin Reply-To: potekhin@bnl.gov Organization: Brookhaven National Laboratory User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Mass deletion -- slowing down References: <1302618388.3794.34.camel@mierdi-laptop> <4DA47CCD.50509@panasiangroup.com> <1302630144.1732.2.camel@Avalon> <4DA491EE.6010500@panasiangroup.com> <4EB84C9F.8040208@bnl.gov> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hello, My data load comes in batches representing one day in the life of a large computing facility. I index the data by the day it was produced, to be able to quickly pull data for a specific day within the last year or two. There are 6 other indexes. When it comes to retiring the data, I intend to delete it for the oldest date and after that add a fresh batch of data, so I control the disk space. Therein lies a problem -- and it maybe Pycassa related, so I also filed an issue on github -- then I select by 'DATE=blah' and then do a batch remove, it works fine for a while, and then after a few thousand deletions (done in batches of 1000) it grinds to a halt, i.e. I can no longer iterate the result, which manifests in a timeout error. Is that a behavior seen before? Cassandra version is 0.8.6, Pycassa 1.3.0. TIA, Maxim