cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Haguenauer>
Subject Bulk loading performance
Date Mon, 13 Jul 2015 22:31:48 GMT

I have a use case wherein I receive a daily batch of data; it's about
50M--100M records (a record is a list of integers, keyed by a
UUID). The target is a 12-node cluster.

Using a simple-minded approach (24 batched inserts in parallel, using
the Ruby client), while the cluster is being read at a rate of about
150k/s, I get about 15.5k insertions per second. This in itself is
satisfactory, but the concern is that the large amount of writes
causes the read latency to jump up during the insertion, and for a
while after.

I tried using sstableloader instead, and the overall throughput is
similar (I spend 2/3 of the time preparing the SSTables, and 1/3
actually pushing them to nodes), but I believe this still causes a
hike in read latency (after the load is complete).

Is there a set of best practices for this kind of workload? We would
like to avoid interfering with reads as much as possible.

I can of course post more information about our setup and requirements
if this helps answering.

David Haguenauer

View raw message