incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: insert and batch_insert
Date Mon, 16 May 2011 09:01:37 GMT
batch_mutate() and insert() follow the a similar execution path to a single insert in the server.
It's not like putting multiple statements in a Transaction in the RDBMS. 

Where they do differ is that you can provide multiple columns for a row in a column family,
and these will be applied as one operation including only one write to the commit log. However
row you send requires a write to the commit log.

What sort of data are you writing ? Are their multiple columns per row ? 

Another consideration is that each row becomes an mutation in the cluster. If a connection
sends 1000's of rows at once all of it's mutations *could* momentarily fill all the available
mutation workers on a node. This can slow down other clients connected to the cluster if they
also need to write to that node. Watch the TPStats to see if the mutation pool has spikes
in the pending range. You may want to reduce the batch size if clients are seeing high latency.


Hope that helps.
 
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 15 May 2011, at 10:34, Xiaowei Wang wrote:

> Hi,
> 
> We use Cassandra 0.7.4 to do TPC-C data loading on ec2 nodes. The loading driver is written
in pycassa. We test the loading speed on insert and batch_insert, but it seems no significant
difference. I know Cassandra first write data to memory. But still confused why batch_insert
does not quick than single row insert. We only batch 2000 or 3000 rows a time..
> 
> Thanks for your help!
> 
> Best,
> Xiaowei


Mime
View raw message