incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Freeman <>
Subject Re: insert performance (1.2.8)
Date Tue, 20 Aug 2013 02:14:30 GMT
Sure, I've tried different numbers for batches and threads, but 
generally I'm running 10-30 threads at a time on the client, each 
sending a batch of 100 insert statements in every call, using the 
QueryBuilder.batch() API from the latest datastax java driver, then 
calling the Session.execute() function (synchronous) on the Batch.

I can't post my code, but my client does this on each iteration:
-- divides up the set of inserts by the number of threads
-- stores the current time
-- tells all the threads to send their inserts
-- then when they've all returned checks the elapsed time

At about 2000 rows for each iteration, 20 threads with 100 inserts each 
finish in about 1 second.  For 4000 rows, 40 threads with 100 inserts 
each finish in about 1.5 - 2 seconds, and as I said all 3 cassandra 
nodes have a heavy CPU load while the client is hardly loaded.  I've 
tried with 10 threads and more inserts per batch, or up to 60 threads 
with fewer, doesn't seem to make a lot of difference.

On 08/19/2013 05:00 PM, Nate McCall wrote:
> How big are the batch sizes? In other words, how many rows are you 
> sending per insert operation?
> Other than the above, not much else to suggest without seeing some 
> example code (on pastebin, gist or similar, ideally).
> On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman < 
> <>> wrote:
>     I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on
>     2.5Ghz machines not shared with any other VMs).  I'm inserting
>     time-series data into a single column-family using "wide rows"
>     (timeuuids) and have a 3-part partition key so my primary key is
>     something like ((a, b, day), in-time-uuid), x, y, z).
>     My java client is feeding rows (about 1k of raw data size each) in
>     batches using multiple threads, and the fastest I can get it run
>     reliably is about 2000 rows/second.  Even at that speed, all 3
>     cassandra nodes are very CPU bound, with loads of 6-9 each (and
>     the client machine is hardly breaking a sweat).  I've tried
>     turning off compression in my table which reduced the loads
>     slightly but not much.  There are no other updates or reads
>     occurring, except the datastax opscenter.
>     I was expecting to be able to insert at least 10k rows/second with
>     this configuration, and after a lot of reading of docs, blogs, and
>     google, can't really figure out what's slowing my client down.
>      When I increase the insert speed of my client beyond 2000/second,
>     the server responses are just too slow and the client falls
>     behind.  I had a single-node Mysql database that can handle 10k of
>     these data rows/second, so I really feel like I'm missing
>     something in Cassandra.  Any ideas?

View raw message