cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dong Dai <daidon...@gmail.com>
Subject Re: Performance Difference between Batch Insert and Bulk Load
Date Thu, 04 Dec 2014 17:50:48 GMT

> On Dec 4, 2014, at 11:37 AM, Tyler Hobbs <tyler@datastax.com> wrote:
> 
> 
> On Wed, Dec 3, 2014 at 11:02 PM, Dong Dai <daidongly@gmail.com <mailto:daidongly@gmail.com>>
wrote:
> 
> 1) except I am using TokenAwarePolicy, the async insert also can not be sent to 
> the right coordinator. 
> 
> Yes.  Of course, TokenAwarePolicy can wrap any other policy.
>  
> 
> 2) the TokenAwarePolicy actually is doing the job that coordinators
> do: calculate the data placement by the keyspace and partition key. 
> 
> That's correct, it does the same calculation that the coordinator does.
> 

Thanks for the clarification. This leads to my previous discussion with Ryan. 
As we already did what coordinators do in client side, why don’t we do one step more:
break the UNLOGGED batch statements into several small batch statements, each of which contains
the statements with the same partition key. And send them to different coordinators based
on TokenAwarePolicy? This will save lots of RPC times, right?

The reason I asked is I have a use case where importing huge data into 
Cassandra is a very common case, and all these importing do not need to be atomic.

thanks,
- Dong

> 
> -- 
> Tyler Hobbs
> DataStax <http://datastax.com/>


Mime
View raw message