cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Should I throttle deletes?
Date Thu, 05 Jan 2012 09:25:56 GMT
> I use a batch mutator in Pycassa to delete ~1M rows based on
> a longish list of keys I'm extracting from an auxiliary CF (with no
> problem of any sort).
What is the size of the deletion batches ?

> Now, it appears that such heads-on delete puts a temporary
> but large load on the cluster. I have SSD's and they go to 100%
> utilization, and the CPU spikes to significant loads.
Does the load spike during the deletion or after it ? 
Do any of the thread pool back up in nodetool tpstats during the load ?  

I can think of a few general issues you may want to avoid:

* Each row in a batch mutation is handled by a task in a thread pool on the nodes. So if you
send a batch to delete 1,000 rows it will put 1,000 tasks in the Mutation stage. This will
reduce the query throughput.
* Lots of deletes in a row will add overhead to reads on the row. 

You may want to check for excessive memtable flushing, but if you have default automatic memory
management running lots of deletes should not result in extra flushing.  

Hope that helps
Aaron

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/01/2012, at 10:13 AM, Maxim Potekhin wrote:

> Now that my cluster appears to run smoothly and after a few successful
> repairs and compacts, I'm back in the business of deletion of portions
> of data based on its date of insertion. For reasons too lengthy to be
> explained here, I don't want to use TTL.
> 
> I use a batch mutator in Pycassa to delete ~1M rows based on
> a longish list of keys I'm extracting from an auxiliary CF (with no
> problem of any sort).
> 
> Now, it appears that such heads-on delete puts a temporary
> but large load on the cluster. I have SSD's and they go to 100%
> utilization, and the CPU spikes to significant loads.
> 
> Does anyone do throttling on such mass-delete procedure?
> 
> Thanks in advance,
> 
> Maxim
> 


Mime
View raw message