Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 80FB6900D for ; Mon, 14 Nov 2011 02:02:36 +0000 (UTC) Received: (qmail 59404 invoked by uid 500); 14 Nov 2011 02:02:33 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 59339 invoked by uid 500); 14 Nov 2011 02:02:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 59331 invoked by uid 99); 14 Nov 2011 02:02:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Nov 2011 02:02:33 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of potekhin@bnl.gov designates 130.199.3.132 as permitted sender) Received: from [130.199.3.132] (HELO smtpgw.bnl.gov) (130.199.3.132) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Nov 2011 02:02:24 +0000 X-BNL-policy-q: X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgEFAI11wE6CxzYH/2dsb2JhbABChQCiXIIfgQWBcgEBBSMVQBELGAICBRYLAgIJAwIBAgFFEwgBAa5IkGGBMIU6gX+BFgSIEJFZjFw X-IronPort-AV: E=Sophos;i="4.69,504,1315195200"; d="scan'208";a="152670214" Received: from rcf.rhic.bnl.gov ([130.199.54.7]) by smtpgw.sec.bnl.local with ESMTP/TLS/DHE-RSA-AES256-SHA; 13 Nov 2011 21:02:03 -0500 Received: from [192.168.0.195] (ool-18bde93d.dyn.optonline.net [24.189.233.61]) (authenticated bits=0) by rcf.rhic.bnl.gov (8.13.8/8.13.8) with ESMTP id pAE223rD030380 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sun, 13 Nov 2011 21:02:03 -0500 Message-ID: <4EC0769D.6090402@bnl.gov> Date: Sun, 13 Nov 2011 21:02:05 -0500 From: Maxim Potekhin User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Mass deletion -- slowing down References: <1302618388.3794.34.camel@mierdi-laptop> <4DA47CCD.50509@panasiangroup.com> <1302630144.1732.2.camel@Avalon> <4DA491EE.6010500@panasiangroup.com> <4EB84C9F.8040208@bnl.gov> <4EBC7AB8.7030105@bnl.gov> <4EC066E7.9090707@bnl.gov> <4EC06E1B.9020905@bnl.gov> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Thanks Peter, I'm not sure I entirely follow. By the oldest data, do you mean the primary key corresponding to the limit of the time horizon? Unfortunately, unique IDs and the timstamps do not correlate in the sense that chronologically "newer" entries might have a smaller sequential ID. That's because the timestamp corresponds to the last update that's stochastic in the sense that the jobs can take from seconds to days to complete. As I said I'm not sure I understood you correctly. Also, I note that queries on different dates (i.e. not "contaminated" with lots of tombstones) work just fine, which is consistent with the picture that emerged so far. Theoretically -- would compaction or cleanup help? Thanks Maxim On 11/13/2011 8:39 PM, Peter Schuller wrote: >> I do limit the number of rows I'm asking for in Pycassa. Queries on primary >> keys still work fine, > Is it feasable in your situation to keep track of the oldest possible > data (for example, if there is a single sequential writer that rotates > old entries away it could keep a record of what the oldest might be) > so that you can bound your index lookup>= that value (and avoid the > tombstones)? >