Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of potekhin@bnl.gov designates
 130.199.3.132 as permitted sender)
Message-ID: <4EBC7AB8.7030105@bnl.gov>
Date: Thu, 10 Nov 2011 20:30:32 -0500
From: Maxim Potekhin <potekhin@bnl.gov>
Reply-To: potekhin@bnl.gov
Organization: Brookhaven National Laboratory
User-Agent: Mozilla/5.0 (Windows NT 5.1;
 rv:8.0) Gecko/20111105 Thunderbird/8.0
MIME-Version: 1.0
To: user@cassandra.apache.org
Subject: Mass deletion -- slowing down
References: <1302618388.3794.34.camel@mierdi-laptop>
 <4DA47CCD.50509@panasiangroup.com> <1302630144.1732.2.camel@Avalon>
 <4DA491EE.6010500@panasiangroup.com> <4EB84C9F.8040208@bnl.gov>
 <CAKkz8Q03bk1=moG__noMBV6zUAo+9i7ocLMQesWceUU9voc6=g@mail.gmail.com>
In-Reply-To: 
 <CAKkz8Q03bk1=moG__noMBV6zUAo+9i7ocLMQesWceUU9voc6=g@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hello,

My data load comes in batches representing one day in the life of a 
large computing facility.
I index the data by the day it was produced, to be able to quickly pull 
data for a specific day
within the last year or two. There are 6 other indexes.

When it comes to retiring the data, I intend to delete it for the oldest 
date and after that add
a fresh batch of data, so I control the disk space. Therein lies a 
problem -- and it maybe
Pycassa related, so I also filed an issue on github -- then I select by 
'DATE=blah' and then
do a batch remove, it works fine for a while, and then after a few 
thousand deletions (done
in batches of 1000) it grinds to a halt, i.e. I can no longer iterate 
the result, which manifests
in a timeout error.

Is that a behavior seen before? Cassandra version is 0.8.6, Pycassa 1.3.0.

TIA,

Maxim