incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaowei Wang <xiaowei...@gmail.com>
Subject Re: insert and batch_insert
Date Mon, 16 May 2011 15:44:50 GMT
Thanks Aaron, really help!

2011/5/16 aaron morton <aaron@thelastpickle.com>

> batch_mutate() and insert() follow the a similar execution path to a single
> insert in the server. It's not like putting multiple statements in a
> Transaction in the RDBMS.
>
> Where they do differ is that you can provide multiple columns for a row in
> a column family, and these will be applied as one operation including only
> one write to the commit log. However row you send requires a write to the
> commit log.
>
> What sort of data are you writing ? Are their multiple columns per row ?
>
> Another consideration is that each row becomes an mutation in the cluster.
> If a connection sends 1000's of rows at once all of it's mutations *could*
> momentarily fill all the available mutation workers on a node. This can slow
> down other clients connected to the cluster if they also need to write to
> that node. Watch the TPStats to see if the mutation pool has spikes in the
> pending range. You may want to reduce the batch size if clients are seeing
> high latency.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 15 May 2011, at 10:34, Xiaowei Wang wrote:
>
> > Hi,
> >
> > We use Cassandra 0.7.4 to do TPC-C data loading on ec2 nodes. The loading
> driver is written in pycassa. We test the loading speed on insert and
> batch_insert, but it seems no significant difference. I know Cassandra first
> write data to memory. But still confused why batch_insert does not quick
> than single row insert. We only batch 2000 or 3000 rows a time..
> >
> > Thanks for your help!
> >
> > Best,
> > Xiaowei
>
>

Mime
View raw message