Return-Path: X-Original-To: apmail-cassandra-dev-archive@www.apache.org Delivered-To: apmail-cassandra-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D4D83F0F3 for ; Tue, 9 Apr 2013 21:17:03 +0000 (UTC) Received: (qmail 96654 invoked by uid 500); 9 Apr 2013 21:17:02 -0000 Delivered-To: apmail-cassandra-dev-archive@cassandra.apache.org Received: (qmail 96597 invoked by uid 500); 9 Apr 2013 21:17:02 -0000 Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list dev@cassandra.apache.org Received: (qmail 96586 invoked by uid 99); 9 Apr 2013 21:17:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Apr 2013 21:17:02 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of wkatsak@cs.rutgers.edu designates 128.6.4.3 as permitted sender) Received: from [128.6.4.3] (HELO dragon.rutgers.edu) (128.6.4.3) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Apr 2013 21:16:56 +0000 X-ExtScanner: Niversoft's Regex Matcher X-Virus-Scanned: by dragon-cgpav-clamav-v1.3b X-ExtFilter: Niversoft's DomainKeys Helper DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; d=cs.rutgers.edu; s=mx; h=Message-ID:Date:From:User-Agent:MIME-Version:To:Subject: References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=eDl4ON8Dldgo6vVIcrxUI6ng4VC6qBNm1fZhITQwpFsfMkgD65x66PYhfz96MtoMBA 9sXgeV9HTDYjYQN3nyB5Z2US6jagGjEPqdbKhXaQ+MtwIhgJ2pbNAaVnO+gFWCVydn0M IiYnNdDAb08qtfJBJylUm/2AedKuAD5TYSpCs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.rutgers.edu; s=mx; l=2398; t=1365542195; x=1366146995; q=dns/txt; h=Message-ID:Date:From:User-Agent:MIME-Version:To: Subject:References:In-Reply-To:Content-Type: Content-Transfer-Encoding; bh=lR+7lvZ0Sf4Kq6AY2oaC7ERRzn4JodDQf4 4ggLuZh6c=; b=VnPCjf9P4erIfwa0HylQjZd49jOQvBPh69tr6u2dTRUmUOZPXN 6U6LhdNKjDhmuBTwCEb91VjW+ynAi8ahTmfE4RvNQuZP2eiT8UsHhmT60ol8K/3b wpl569cEuaCCvsQr08syacDWQChq9FyJ013eVesWy1y2usPxufeU86nCQ= Received: from [172.16.28.19] (account wkatsak@dragon.rutgers.edu [172.16.28.19] verified) by dragon.rutgers.edu (CommuniGate Pro SMTP 5.4.7) with ESMTPSA id 108851023 for dev@cassandra.apache.org; Tue, 09 Apr 2013 17:16:35 -0400 Message-ID: <51648533.9030602@cs.rutgers.edu> Date: Tue, 09 Apr 2013 17:16:35 -0400 From: William Katsak User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130214 Thunderbird/17.0.2 MIME-Version: 1.0 To: dev@cassandra.apache.org Subject: Re: Streaming RowMutations (and possibly merging them) References: <5162EBEF.90102@cs.rutgers.edu> In-Reply-To: <5162EBEF.90102@cs.rutgers.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hello, I apologize for my very vague email, I shouldn't have written it in such a hurry. I would like to clarify my use case and requirements, so that maybe someone can give me some advice. I am building a research version of Cassandra in which a missed write is a normal case (e.g. out of n replicas, it would be a normal case for at least one of these to miss a write). I keep track of missed writes similar to how default Cassandra does for HintedHandoff (a column family in system that stores serialized RowMutations). Later, when the nodes that were missed are ready to receive writes again, the node caching the RowMutations sends them one a a time until they have all been delivered. This all happens in the context of a live, serving system. My system works and does what it is supposed to, now I am trying to improve performance. I currently have two optimizations in mind, but am not sure how to approach them: 1) Minimize the transfer of excessive RowMutations by merging all RowMutations for the same key, and transmitting only one per key. In the event that a subset of keys are very popular, I can minimize how much I need to transfer to bring a node back up to date. I am thinking I can go inside the RowMutation and merge each ColumnFamily, then create a new RowMutation with the merged CFs. Is ColumnFamily.diff() the right way to merge an invididual CF, or am I misunderstanding it? 2) Serialize a whole bunch of RowMutations into a chunk, stream the chunk to the appropriate node, deserialize them, and apply them individually. In this case, I would avoid having to wait for an ACK on each mutation, and could more efficiently send lots of data. Is this feasible with the existing streaming infrastructure, or would I have to implement a new facility? Again, my codebase is on top of Cassandra 1.1.6. I would very much appreciate any insight anyone could give me. Thanks very much, Bill Katsak On 04/08/2013 12:10 PM, William Katsak wrote: > Hello, > > I am sorry to bother the list with this question, but I was wondering, > assuming I have many saved (small) mutations (of the type that hinted > handoff uses), is there any easy way to put these all together and > bulk transmit (stream) them to a destination node? > > My codebase is based on Cassandra 1.1.6. > > Thanks very much in advance, > Bill Katsak > > >