Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2D11C73A9 for ; Sun, 13 Nov 2011 23:58:02 +0000 (UTC) Received: (qmail 83972 invoked by uid 500); 13 Nov 2011 23:58:00 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 83943 invoked by uid 500); 13 Nov 2011 23:58:00 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 83935 invoked by uid 99); 13 Nov 2011 23:58:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Nov 2011 23:58:00 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of potekhin@bnl.gov designates 130.199.3.132 as permitted sender) Received: from [130.199.3.132] (HELO smtpgw.bnl.gov) (130.199.3.132) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Nov 2011 23:57:50 +0000 X-BNL-policy-q: X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Av8EABBZwE6CxzYH/2dsb2JhbABCp1yCH4EFgXIBAQU4QBELGAkWDwkDAgECAUUTCAEBFb5VhmqDFQSIEJFZjFw X-IronPort-AV: E=Sophos;i="4.69,504,1315195200"; d="scan'208";a="151334531" Received: from rcf.rhic.bnl.gov ([130.199.54.7]) by smtpgw.sec.bnl.local with ESMTP/TLS/DHE-RSA-AES256-SHA; 13 Nov 2011 18:57:27 -0500 Received: from [192.168.0.195] (ool-18bde93d.dyn.optonline.net [24.189.233.61]) (authenticated bits=0) by rcf.rhic.bnl.gov (8.13.8/8.13.8) with ESMTP id pADNvQkr021333 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sun, 13 Nov 2011 18:57:27 -0500 Message-ID: <4EC0596A.4050308@bnl.gov> Date: Sun, 13 Nov 2011 18:57:30 -0500 From: Maxim Potekhin User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Mass deletion -- slowing down References: <1302618388.3794.34.camel@mierdi-laptop> <4DA47CCD.50509@panasiangroup.com> <1302630144.1732.2.camel@Avalon> <4DA491EE.6010500@panasiangroup.com> <4EB84C9F.8040208@bnl.gov> <4EBC7AB8.7030105@bnl.gov> In-Reply-To: <4EBC7AB8.7030105@bnl.gov> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org I've done more experimentation and the behavior persists: I start with a normal dataset which is searcheable by a secondary index. I select by that index the entries that match a certain criterion, then delete those. I tried two methods of deletion -- individual cf.remove() as well as batch removal in Pycassa. What happens after that is as follows: attempts to read the same CF, using the same index values start to time out in the Pycassa client (there is a thrift message about timeout). The entries not touched by such attempted deletion are read just fine still. Has anyone seen such behavior? Thanks, Maxim On 11/10/2011 8:30 PM, Maxim Potekhin wrote: > Hello, > > My data load comes in batches representing one day in the life of a > large computing facility. > I index the data by the day it was produced, to be able to quickly > pull data for a specific day > within the last year or two. There are 6 other indexes. > > When it comes to retiring the data, I intend to delete it for the > oldest date and after that add > a fresh batch of data, so I control the disk space. Therein lies a > problem -- and it maybe > Pycassa related, so I also filed an issue on github -- then I select > by 'DATE=blah' and then > do a batch remove, it works fine for a while, and then after a few > thousand deletions (done > in batches of 1000) it grinds to a halt, i.e. I can no longer iterate > the result, which manifests > in a timeout error. > > Is that a behavior seen before? Cassandra version is 0.8.6, Pycassa > 1.3.0. > > TIA, > > Maxim