cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Potekhin <>
Subject Re: Should I throttle deletes?
Date Thu, 05 Jan 2012 19:28:08 GMT
Hello Aaron,

On 1/5/2012 4:25 AM, aaron morton wrote:
>> I use a batch mutator in Pycassa to delete ~1M rows based on
>> a longish list of keys I'm extracting from an auxiliary CF (with no
>> problem of any sort).
> What is the size of the deletion batches ?

2000 mutations.

>> Now, it appears that such heads-on delete puts a temporary
>> but large load on the cluster. I have SSD's and they go to 100%
>> utilization, and the CPU spikes to significant loads.
> Does the load spike during the deletion or after it ?


> Do any of the thread pool back up in nodetool tpstats during the load ?

Haven't checked, thank you for the lead.

> I can think of a few general issues you may want to avoid:
> * Each row in a batch mutation is handled by a task in a thread pool 
> on the nodes. So if you send a batch to delete 1,000 rows it will put 
> 1,000 tasks in the Mutation stage. This will reduce the query throughput.

Aah. I didn't know that. I was under the impression that batching saves 
the communication overhead, and that's it.

Then I do have a question, what do people generally use as the batch size?



View raw message