cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: Python CQL Batching is slower than single statements
Date Wed, 25 Jan 2012 07:52:29 GMT
There are few slight differences in the execution paths, nothing jumps out (it *looks* like
the authorization to write to the CF is checked for each statement in the batch, not sure
how heavy that is.).

If you send a batch with more statements that concurrent_writers in the yaml some of those
statements will have to wait for an available writer before completing. This will introduce
some latency to the query. You can check pending tasks using nodetool tpstats. 

Before we get into it further some thoughts:

* what cassandra version ?
* you are running the tests one after another on the same running cassandra process ? Or are
you running it against a new process. 
* have a look at the nodetool cfstats to see the write latency for the cf, this is the latency
for the actual write. Does it change ? 
* Use jconsole to look at the o.a.c.db.StorageProxy MBean, the latency there is for entire
* perhaps take a look at the stress testing tools in the distribution and see if their results
concur with yours. 

If you are still having problems let us know and include the python script. 

Aaron Morton
Freelance Developer

On 25/01/2012, at 3:33 PM, Blake Visin wrote:

> So I decided that it would be beneficial to use batching in my application since I am
doing many, many inserts.  When I implemented batching in CQL using 'BEGIN BATCH'..'APPLY
BATCH' I saw a significant decrease in the speed of inserts, no matter the number of insert
statements I included between begin and apply.  I created a simple benchmark script in Python
and posted the results here: 
> As you can see, the larger I made the batches, the longer they took.
> Any ideas where to go from here?
> Thanks,
> Blake

View raw message