cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dong Dai <daidon...@gmail.com>
Subject Performance Difference between Batch Insert and Bulk Load
Date Mon, 01 Dec 2014 04:44:55 GMT
Hi, all, 

I have a performance question about the batch insert and bulk load. 

According to the documents, to import large volume of data into Cassandra, Batch Insert and
Bulk Load can both be an option. Using batch insert is pretty straightforwards, but there
have not been an ‘official’ way to use Bulk Load to import the data (in this case, i mean
the data was generated online). 

So, i am thinking first clients use CQLSSTableWriter to create the SSTable files, then use
“org.apache.cassandra.tools.BulkLoader” to import these SSTables into Cassandra directly.


The question is can I expect a better performance using the BulkLoader this way comparing
with using Batch insert?

I am not so familiar with the implementation of Bulk Load. But i do see a huge performance
improvement using Batch Insert. Really want to know the upper limits of the write performance.
Any comment will be helpful, Thanks!

- Dong
Mime
View raw message