cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Katsak <>
Subject Re: Streaming RowMutations (and possibly merging them)
Date Tue, 09 Apr 2013 21:16:35 GMT

I apologize for my very vague email, I shouldn't have written it in such 
a hurry. I would like to clarify my use case and requirements, so that 
maybe someone can give me some advice.

I am building a research version of Cassandra in which a missed write is 
a normal case (e.g. out of n replicas, it would be a normal case for at 
least one of these to miss a write). I keep track of missed writes 
similar to how default Cassandra does for HintedHandoff (a column family 
in system that stores serialized RowMutations). Later, when the nodes 
that were missed are ready to receive writes again, the node caching the 
RowMutations sends them one a a time until they have all been delivered. 
This all happens in the context of a live, serving system.

My system works and does what it is supposed to, now I am trying to 
improve performance. I currently have two optimizations in mind, but am 
not sure how to approach them:

1) Minimize the transfer of excessive RowMutations by merging all 
RowMutations for the same key, and transmitting only one per key. In the 
event that a subset of keys are very popular, I can minimize how much I 
need to transfer to bring a node back up to date. I am thinking I can go 
inside the RowMutation and merge each ColumnFamily, then create a new 
RowMutation with the merged CFs. Is ColumnFamily.diff() the right way to 
merge an invididual CF, or am I misunderstanding it?

2) Serialize a whole bunch of RowMutations into a chunk, stream the 
chunk to the appropriate node, deserialize them, and apply them 
individually. In this case, I would avoid having to wait for an ACK on 
each mutation, and could more efficiently send lots of data. Is this 
feasible with the existing streaming infrastructure, or would I have to 
implement a new facility?

Again, my codebase is on top of Cassandra 1.1.6. I would very much 
appreciate any insight anyone could give me.

Thanks very much,
Bill Katsak

On 04/08/2013 12:10 PM, William Katsak wrote:
> Hello,
> I am sorry to bother the list with this question, but I was wondering, 
> assuming I have many saved (small) mutations (of the type that hinted 
> handoff uses), is there any easy way to put these all together and 
> bulk transmit (stream) them to a destination node?
> My codebase is based on Cassandra 1.1.6.
> Thanks very much in advance,
> Bill Katsak

View raw message