incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <peter.schul...@infidyne.com>
Subject Re: Mass deletion -- slowing down
Date Mon, 14 Nov 2011 00:22:13 GMT
Deletions in Cassandra imply the use of tombstones (see
http://wiki.apache.org/cassandra/DistributedDeletes) and under some
circumstances reads can turn O(n) with respect to the amount of
columns deleted, depending. It sounds like this is what you're seeing.

For example, suppose you're inserting a range of columns into a row,
deleting it, and inserting another non-overlapping subsequent range.
Repeat that a bunch of times. In terms of what's stored in Cassandra
for the row you now have:

  tomb
  tomb
  tomb
  tomb
  ....
   actual data

If you then do something like a slice on that row with the end-points
being such that they include all the tombstones, Cassandra essentially
has to read through and process all those tombstones (for the
PostgreSQL aware: this is similar to the effect you can get if
implementing e.g. a FIFO queue, where MIN(pos) turns O(n) with respect
to the number of deleted entries until the last vacuum - improved in
modern versions)).


-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Mime
View raw message