incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brandon Williams <dri...@gmail.com>
Subject Re: Mass deletion -- slowing down
Date Mon, 14 Nov 2011 00:21:14 GMT
On Sun, Nov 13, 2011 at 5:57 PM, Maxim Potekhin <potekhin@bnl.gov> wrote:
> I've done more experimentation and the behavior persists: I start with a
> normal dataset which is searcheable by a secondary index. I select by that
> index the entries that match a certain criterion, then delete those. I tried
> two methods of deletion -- individual cf.remove() as well as batch removal
> in Pycassa.
> What happens after that is as follows: attempts to read the same CF, using
> the same index values start to time out in the Pycassa client (there is a
> thrift message about timeout). The entries not touched by such attempted
> deletion are read just fine still.
>
> Has anyone seen such behavior?

What you're probably running into is a huge amount of tombstone
filtering on the read (see
http://wiki.apache.org/cassandra/DistributedDeletes)

Since you're dealing with timeseries data, using a row-bucketing
technique like http://rubyscale.com/2011/basic-time-series-with-cassandra/
might help by eliminating the need for an index.

-Brandon

Mime
View raw message