cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Graham Sanderson <gra...@vast.com>
Subject Re: Bulk loading performance
Date Mon, 13 Jul 2015 23:21:08 GMT
Ironically in my experience the fastest ways to get data into C* are considered “anti-patterns”
by most (but I have no problem saturating multiple gigabit network links if I really feel
like inserting fast)

It’s been a while since I tried some of the newer approaches though (my fast load code is
a few years old).

> On Jul 13, 2015, at 5:31 PM, David Haguenauer <ml@kurokatta.org> wrote:
> 
> Hi,
> 
> I have a use case wherein I receive a daily batch of data; it's about
> 50M--100M records (a record is a list of integers, keyed by a
> UUID). The target is a 12-node cluster.
> 
> Using a simple-minded approach (24 batched inserts in parallel, using
> the Ruby client), while the cluster is being read at a rate of about
> 150k/s, I get about 15.5k insertions per second. This in itself is
> satisfactory, but the concern is that the large amount of writes
> causes the read latency to jump up during the insertion, and for a
> while after.
> 
> I tried using sstableloader instead, and the overall throughput is
> similar (I spend 2/3 of the time preparing the SSTables, and 1/3
> actually pushing them to nodes), but I believe this still causes a
> hike in read latency (after the load is complete).
> 
> Is there a set of best practices for this kind of workload? We would
> like to avoid interfering with reads as much as possible.
> 
> I can of course post more information about our setup and requirements
> if this helps answering.
> 
> -- 
> Thanks,
> David Haguenauer


Mime
View raw message