cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Potekhin <>
Subject Re: Mass deletion -- slowing down
Date Sun, 13 Nov 2011 23:57:30 GMT
I've done more experimentation and the behavior persists: I start with a 
normal dataset which is searcheable by a secondary index. I select by 
that index the entries that match a certain criterion, then delete 
those. I tried two methods of deletion -- individual cf.remove() as well 
as batch removal in Pycassa.
What happens after that is as follows: attempts to read the same CF, 
using the same index values start to time out in the Pycassa client 
(there is a thrift message about timeout). The entries not touched by 
such attempted deletion are read just fine still.

Has anyone seen such behavior?


On 11/10/2011 8:30 PM, Maxim Potekhin wrote:
> Hello,
> My data load comes in batches representing one day in the life of a 
> large computing facility.
> I index the data by the day it was produced, to be able to quickly 
> pull data for a specific day
> within the last year or two. There are 6 other indexes.
> When it comes to retiring the data, I intend to delete it for the 
> oldest date and after that add
> a fresh batch of data, so I control the disk space. Therein lies a 
> problem -- and it maybe
> Pycassa related, so I also filed an issue on github -- then I select 
> by 'DATE=blah' and then
> do a batch remove, it works fine for a while, and then after a few 
> thousand deletions (done
> in batches of 1000) it grinds to a halt, i.e. I can no longer iterate 
> the result, which manifests
> in a timeout error.
> Is that a behavior seen before? Cassandra version is 0.8.6, Pycassa 
> 1.3.0.
> TIA,
> Maxim

View raw message