> > Does anyone know enough of the inner working of Cassandra to tell me how much work is needed to patch Cassandra to enable such communication vectorization/batch ?
> Assuming you mean "have the coordinator send multiple row read/write requests in a single message to replicas"
> Pretty sure this has been raised as a ticket before but I cannot find one now.
> It would be a significant change and I'm not sure how big the benefit is. To send the messages the coordinator places them in a queue, there is little delay sending. Then it waits on them async. So there may be some saving on networking but from the coordinators point of view I think the impact is minimal.
> What is your use case?
Use case = rows with rowkey like (folder id, file id)
And operations read/write multiple rows with same folder id => so, it could make sense to have a partitioner putting rows with same "folder id" on the same replicas.
But so far, Cassandra is not able to exploit this locality as batch effect ends at the coordinator node.
So, my question about the cost estimate for patching Cassandra.
The closest (or exactly corresponding to my need ?) JIRA entries I have found so far are:
CASSANDRA-166: Support batch inserts for more than one key at once
=> "WON'T FIX" status
CASSANDRA-5034: Refactor to introduce Mutation Container in write path
=> I am not very sure if it's related to my topic
On 27/04/2013, at 4:04 AM, DE VITO Dominique <email@example.com> wrote:
We are created a new partitioner that groups some rows with **different** row keys on the same replicas.
But neither the batch_mutate, or the multiget_slice are able to take opportunity of this partitioner-defined placement to vectorize/batch communications between the coordinator and the replicas.
Does anyone know enough of the inner working of Cassandra to tell me how much work is needed to patch Cassandra to enable such communication vectorization/batch ?