cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dong Dai <daidon...@gmail.com>
Subject Re: Performance Difference between Batch Insert and Bulk Load
Date Mon, 01 Dec 2014 20:10:54 GMT
Thanks Rob, 

I guess you mean that BulkLoader is done by streaming whole SSTable to remote servers, so
it is faster?

The documentation says that all the rows in the SSTable will be inserted into the new cluster
conforming to the replication strategy of that cluster. This gives me a felling that the BulkLoader
was done by calling insertion after being transmitted to coordinators. 

I have this question because I tried batch insertion. It is too fast and makes me think that
BulkLoader can not beat it.

thanks,
- Dong

> On Dec 1, 2014, at 1:37 PM, Robert Coli <rcoli@eventbrite.com> wrote:
> 
> On Sun, Nov 30, 2014 at 8:44 PM, Dong Dai <daidongly@gmail.com <mailto:daidongly@gmail.com>>
wrote:
> The question is can I expect a better performance using the BulkLoader this way comparing
with using Batch insert?
> 
> You just asked if writing once (via streaming) is likely to be significantly more efficient
than writing twice (once to the commit log, and then once at flush time).
> 
> Yes.
> 
> =Rob
>  


Mime
View raw message