Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4EFC7DFE6 for ; Thu, 19 Jul 2012 22:27:07 +0000 (UTC) Received: (qmail 32974 invoked by uid 500); 19 Jul 2012 22:27:05 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 32941 invoked by uid 500); 19 Jul 2012 22:27:05 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 32933 invoked by uid 99); 19 Jul 2012 22:27:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Jul 2012 22:27:04 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [66.111.4.27] (HELO out3-smtp.messagingengine.com) (66.111.4.27) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Jul 2012 22:26:58 +0000 Received: from compute4.internal (compute4.nyi.mail.srv.osa [10.202.2.44]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id A861A20A53; Thu, 19 Jul 2012 18:26:36 -0400 (EDT) Received: from frontend1.nyi.mail.srv.osa ([10.202.2.160]) by compute4.internal (MEProxy); Thu, 19 Jul 2012 18:26:36 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=message-id:date:from:mime-version:to:cc :subject:references:in-reply-to:content-type; s=smtpout; bh=1WJX MkLqczLiYG5Xu3LATZ6VNAY=; b=T4sbn753VHs8nL5YRYRH/n1lIQuAs5/gpI+B j4qnm9GCO7opkG90yIHigy2A0MYQoEUMQHX82cftuX3Tqa+2zELQk31eZY1r8V0Y 4HqGHcczvXF1PM2QtcXwPy9Ds6wLiaNZqrjNTP7K5Thmkdc/OoQk++Njw73jkstn +JYxx/M= X-Sasl-enc: GjhdY46Op5CenR/SNEaGdd9vR0Vk65pV+d16oeILDdDd 1342736796 Received: from [10.1.10.19] (unknown [50.136.164.255]) by mail.messagingengine.com (Postfix) with ESMTPA id 1AC328E01C7; Thu, 19 Jul 2012 18:26:36 -0400 (EDT) Message-ID: <5008899A.7080301@mustardgrain.com> Date: Thu, 19 Jul 2012 15:26:34 -0700 From: Kirk True User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: user@cassandra.apache.org CC: Leonid Ilyevsky Subject: Re: Batch update efficiency with composite key References: <008001cd643e$00129460$0037bd20$@itscape.com> <008501cd6490$4c049790$e40dc6b0$@itscape.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------020602090703030807060107" X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------020602090703030807060107 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit In Cassandra you don't read-then-write updates, you just write the updates. Sorry for being dense, but can you clarify a logical vs. physical row? Batching is useful for reducing round trips to the server. On 07/18/2012 06:18 AM, Leonid Ilyevsky wrote: > > I have a question about efficiency of updates to a CF with composite key. > > Let say I have 100 of logical rows to update, and they all belong to > the same physical wide row. In my na�ve understanding (correct me if I > am wrong), in order to update a logical row, Cassandra has to retrieve > the whole physical row, add columns to it, and put it back. So I put > all my 100 updates in a batch and send it over. Would Cassandra be > smart enough to recognize that they all belong to one physical row, > retrieve it once, do all the updates and put it back once? Is my batch > thing even relevant in this case? What happens if I just send updates > one by one? > > I want to understand why I should use batches. I don't really care > about one timestamp for all records, I only care about efficiency. So > I thought, I want to at least save on the number of remote calls, but > I also wonder what happens on Cassandra side. > > > ------------------------------------------------------------------------ > This email, along with any attachments, is confidential and may be > legally privileged or otherwise protected from disclosure. Any > unauthorized dissemination, copying or use of the contents of this > email is strictly prohibited and may be in violation of law. If you > are not the intended recipient, any disclosure, copying, forwarding or > distribution of this email is strictly prohibited and this email and > any attachments should be deleted immediately. This email and any > attachments do not constitute an offer to sell or a solicitation of an > offer to purchase any interest in any investment vehicle sponsored by > Moon Capital Management LP ("Moon Capital"). Moon Capital does not > provide legal, accounting or tax advice. Any statement regarding > legal, accounting or tax matters was not intended or written to be > relied upon by any person as advice. Moon Capital does not waive > confidentiality or privilege as a result of this email. --------------020602090703030807060107 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit In Cassandra you don't read-then-write updates, you just write the updates.

Sorry for being dense, but can you clarify a logical vs. physical row?

Batching is useful for reducing round trips to the server.

On 07/18/2012 06:18 AM, Leonid Ilyevsky wrote:

I have a question about efficiency of updates to a CF with composite key.

 

Let say I have 100 of logical rows to update, and they all belong to the same physical wide row. In my naïve understanding (correct me if I am wrong), in order to update a logical row, Cassandra has to retrieve the whole physical row, add columns to it, and put it back. So I put all my 100 updates in a batch and send it over. Would Cassandra be smart enough to recognize that they all belong to one physical row, retrieve it once, do all the updates and put it back once? Is my batch thing even relevant in this case? What happens if I just send updates one by one?

 

I want to understand why I should use batches. I don’t really care about one timestamp for all records, I only care about efficiency. So I thought, I want to at least save on the number of remote calls, but I also wonder what happens on Cassandra side.

 



This email, along with any attachments, is confidential and may be legally privileged or otherwise protected from disclosure. Any unauthorized dissemination, copying or use of the contents of this email is strictly prohibited and may be in violation of law. If you are not the intended recipient, any disclosure, copying, forwarding or distribution of this email is strictly prohibited and this email and any attachments should be deleted immediately. This email and any attachments do not constitute an offer to sell or a solicitation of an offer to purchase any interest in any investment vehicle sponsored by Moon Capital Management LP (“Moon Capital”). Moon Capital does not provide legal, accounting or tax advice. Any statement regarding legal, accounting or tax matters was not intended or written to be relied upon by any person as advice. Moon Capital does not waive confidentiality or privilege as a result of this email.

--------------020602090703030807060107--