incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Streaming RowMutations (and possibly merging them)
Date Tue, 09 Apr 2013 21:25:00 GMT
You can probably leverage the bulk writer API.  Look at
SSTableSimpleUnsortedWriter for example.


On Tue, Apr 9, 2013 at 4:16 PM, William Katsak <wkatsak@cs.rutgers.edu>wrote:

> Hello,
>
> I apologize for my very vague email, I shouldn't have written it in such a
> hurry. I would like to clarify my use case and requirements, so that maybe
> someone can give me some advice.
>
> I am building a research version of Cassandra in which a missed write is a
> normal case (e.g. out of n replicas, it would be a normal case for at least
> one of these to miss a write). I keep track of missed writes similar to how
> default Cassandra does for HintedHandoff (a column family in system that
> stores serialized RowMutations). Later, when the nodes that were missed are
> ready to receive writes again, the node caching the RowMutations sends them
> one a a time until they have all been delivered. This all happens in the
> context of a live, serving system.
>
> My system works and does what it is supposed to, now I am trying to
> improve performance. I currently have two optimizations in mind, but am not
> sure how to approach them:
>
> 1) Minimize the transfer of excessive RowMutations by merging all
> RowMutations for the same key, and transmitting only one per key. In the
> event that a subset of keys are very popular, I can minimize how much I
> need to transfer to bring a node back up to date. I am thinking I can go
> inside the RowMutation and merge each ColumnFamily, then create a new
> RowMutation with the merged CFs. Is ColumnFamily.diff() the right way to
> merge an invididual CF, or am I misunderstanding it?
>
> 2) Serialize a whole bunch of RowMutations into a chunk, stream the chunk
> to the appropriate node, deserialize them, and apply them individually. In
> this case, I would avoid having to wait for an ACK on each mutation, and
> could more efficiently send lots of data. Is this feasible with the
> existing streaming infrastructure, or would I have to implement a new
> facility?
>
> Again, my codebase is on top of Cassandra 1.1.6. I would very much
> appreciate any insight anyone could give me.
>
> Thanks very much,
> Bill Katsak
>
> On 04/08/2013 12:10 PM, William Katsak wrote:
>
>> Hello,
>>
>> I am sorry to bother the list with this question, but I was wondering,
>> assuming I have many saved (small) mutations (of the type that hinted
>> handoff uses), is there any easy way to put these all together and bulk
>> transmit (stream) them to a destination node?
>>
>> My codebase is based on Cassandra 1.1.6.
>>
>> Thanks very much in advance,
>> Bill Katsak
>>
>>
>>
>>
>


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message