cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Coli <>
Subject Re: Performance Difference between Batch Insert and Bulk Load
Date Mon, 01 Dec 2014 22:27:46 GMT
On Mon, Dec 1, 2014 at 12:10 PM, Dong Dai <> wrote:

> I guess you mean that BulkLoader is done by streaming whole SSTable to
> remote servers, so it is faster?

Well, it's not exactly "whole SSTable" but yes, that's the sort of
statement I'm making. [1]

> The documentation says that all the rows in the SSTable will be inserted
> into the new cluster conforming to the replication strategy of that
> cluster. This gives me a felling that the BulkLoader was done by calling
> insertion after being transmitted to coordinators.

A good slide-deck from pgorla, here :

General background.

But briefly, no. It uses the streaming interface, not the client interface.
The streaming interface results in avoiding the whole commitlog/memtable

I have this question because I tried batch insertion. It is too fast and
> makes me think that BulkLoader can not beat it.

Turn of writes to the commitlog with durable_writes:false and you can
simulate how much faster it would be without the double-write to the
commitlog. That said, the double-write to the commitlog is one of the most
significant overheads of doing a write from the client, but it is far from
the only overhead.



View raw message