incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: cost estimate about some Cassandra patchs
Date Tue, 07 May 2013 08:21:47 GMT
> Use case = rows with rowkey like (folder id, file id)
> And operations read/write multiple rows with same folder id => so, it could make sense
to have a partitioner putting rows with same "folder id" on the same replicas.
The entire row key the thing we use to make the token used to both locate the replicas and
place the row in the node. I don't see that changing. 

Have you done any performance testing to see if this is a problem?

Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/05/2013, at 5:27 AM, DE VITO Dominique <dominique.devito@thalesgroup.com> wrote:

> > De : aaron morton [mailto:aaron@thelastpickle.com] 
> > Envoyé : dimanche 28 avril 2013 22:54
> > À : user@cassandra.apache.org
> > Objet : Re: cost estimate about some Cassandra patchs
> > 
> > > Does anyone know enough of the inner working of Cassandra to tell me how much
work is needed to patch Cassandra to enable such communication vectorization/batch ?
> > 
>  
> > Assuming you mean "have the coordinator send multiple row read/write requests in
a single message to replicas"
> > 
> > Pretty sure this has been raised as a ticket before but I cannot find one now. 
> > 
> > It would be a significant change and I'm not sure how big the benefit is. To send
the messages the coordinator places them in a queue, there is little delay sending. Then it
waits on them async. So there may be some saving on networking but from the coordinators point
of view I think the impact is minimal. 
> > 
> > What is your use case?
>  
> Use case = rows with rowkey like (folder id, file id)
> And operations read/write multiple rows with same folder id => so, it could make sense
to have a partitioner putting rows with same "folder id" on the same replicas.
>  
> But so far, Cassandra is not able to exploit this locality as batch effect ends at the
coordinator node.
>  
> So, my question about the cost estimate for patching Cassandra.
>  
> The closest (or exactly corresponding to my need ?) JIRA entries I have found so far
are:
>  
> CASSANDRA-166: Support batch inserts for more than one key at once
> https://issues.apache.org/jira/browse/CASSANDRA-166
> => "WON'T FIX" status
>  
> CASSANDRA-5034: Refactor to introduce Mutation Container in write path
> https://issues.apache.org/jira/browse/CASSANDRA-5034
> => I am not very sure if it's related to my topic
>  
> Thanks.
>  
> Dominique
>  
>  
>  
> > 
> > Cheers
> > 
> > 
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Consultant
> > New Zealand
> > 
> > @aaronmorton
> > http://www.thelastpickle.com
>  
> On 27/04/2013, at 4:04 AM, DE VITO Dominique <dominique.devito@thalesgroup.com>
wrote:
> 
> 
> Hi,
>  
> We are created a new partitioner that groups some rows with **different** row keys on
the same replicas.
>  
> But neither the batch_mutate, or the multiget_slice are able to take opportunity of this
partitioner-defined placement to vectorize/batch communications between the coordinator and
the replicas.
>  
> Does anyone know enough of the inner working of Cassandra to tell me how much work is
needed to patch Cassandra to enable such communication vectorization/batch ?
>  
> Thanks.
>  
> Regards,
> Dominique
>  
>  


Mime
View raw message