Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3B15A6E3B for ; Mon, 16 May 2011 15:45:20 +0000 (UTC) Received: (qmail 61959 invoked by uid 500); 16 May 2011 15:45:18 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 61934 invoked by uid 500); 16 May 2011 15:45:18 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 61926 invoked by uid 99); 16 May 2011 15:45:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 May 2011 15:45:18 +0000 X-ASF-Spam-Status: No, hits=3.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of xiaowei609@gmail.com designates 209.85.215.44 as permitted sender) Received: from [209.85.215.44] (HELO mail-ew0-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 May 2011 15:45:10 +0000 Received: by ewy19 with SMTP id 19so1674321ewy.31 for ; Mon, 16 May 2011 08:44:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=O3sUvODyngaCc4ua0wT/USCLI3YmY56iXNkvx7Vy/ng=; b=ADYcS7f5t8SVUMQsl0pwPMCn2KXSsdzaqJ0wTBymDgn1MnuT6D7cc/kh0TkWTXS+Qt daGhv+JzL38R5GFV+53AQGByfS3sA+NZjb5RqReMNmejaTotyMTUhDE8mrp+lYqHr17K cOxBm+v/r1JhZMEfzlT+OPoP36bwsdvYCm8pE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=ZfbU3p/26H0ig0jjd7rFbsiznulNmWuTjWWO8HHZAa57ZpHibCocab6gXpX28FpqAR aieMtfu+oAEaaShAjGbBHxAAe5FSnIbNqFWvM+Dn/fZtBCBuRXP9oye65cjw273EWU4x 5DtvJMcP1cVMIGKT6S+/EQmPxyDZtnxcqXQMM= MIME-Version: 1.0 Received: by 10.213.98.203 with SMTP id r11mr1763802ebn.120.1305560690700; Mon, 16 May 2011 08:44:50 -0700 (PDT) Received: by 10.213.10.131 with HTTP; Mon, 16 May 2011 08:44:50 -0700 (PDT) In-Reply-To: <1B5A876D-CF35-40EB-A4C7-0491673942E2@thelastpickle.com> References: <1B5A876D-CF35-40EB-A4C7-0491673942E2@thelastpickle.com> Date: Mon, 16 May 2011 11:44:50 -0400 Message-ID: Subject: Re: insert and batch_insert From: Xiaowei Wang To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=00504502d22b33b11804a36689fc X-Virus-Checked: Checked by ClamAV on apache.org --00504502d22b33b11804a36689fc Content-Type: text/plain; charset=ISO-8859-1 Thanks Aaron, really help! 2011/5/16 aaron morton > batch_mutate() and insert() follow the a similar execution path to a single > insert in the server. It's not like putting multiple statements in a > Transaction in the RDBMS. > > Where they do differ is that you can provide multiple columns for a row in > a column family, and these will be applied as one operation including only > one write to the commit log. However row you send requires a write to the > commit log. > > What sort of data are you writing ? Are their multiple columns per row ? > > Another consideration is that each row becomes an mutation in the cluster. > If a connection sends 1000's of rows at once all of it's mutations *could* > momentarily fill all the available mutation workers on a node. This can slow > down other clients connected to the cluster if they also need to write to > that node. Watch the TPStats to see if the mutation pool has spikes in the > pending range. You may want to reduce the batch size if clients are seeing > high latency. > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 15 May 2011, at 10:34, Xiaowei Wang wrote: > > > Hi, > > > > We use Cassandra 0.7.4 to do TPC-C data loading on ec2 nodes. The loading > driver is written in pycassa. We test the loading speed on insert and > batch_insert, but it seems no significant difference. I know Cassandra first > write data to memory. But still confused why batch_insert does not quick > than single row insert. We only batch 2000 or 3000 rows a time.. > > > > Thanks for your help! > > > > Best, > > Xiaowei > > --00504502d22b33b11804a36689fc Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Aaron, really help!

2011/5/16 aaro= n morton <a= aron@thelastpickle.com>
batch_mutate() and insert() follow the a similar execution path to a single= insert in the server. It's not like putting multiple statements in a T= ransaction in the RDBMS.

Where they do differ is that you can provide multiple columns for a row in = a column family, and these will be applied as one operation including only = one write to the commit log. However row you send requires a write to the c= ommit log.

What sort of data are you writing ? Are their multiple columns per row ?
Another consideration is that each row becomes an mutation in the cluster. = If a connection sends 1000's of rows at once all of it's mutations = *could* momentarily fill all the available mutation workers on a node. This= can slow down other clients connected to the cluster if they also need to = write to that node. Watch the TPStats to see if the mutation pool has spike= s in the pending range. You may want to reduce the batch size if clients ar= e seeing high latency.

Hope that helps.

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thela= stpickle.com

On 15 May 2011, at 10:34, Xiaowei Wang wrote:

> Hi,
>
> We use Cassandra 0.7.4 to do TPC-C data loading on ec2 nodes. The load= ing driver is written in pycassa. We test the loading speed on insert and b= atch_insert, but it seems no significant difference. I know Cassandra first= write data to memory. But still confused why batch_insert does not quick t= han single row insert. We only batch 2000 or 3000 rows a time..
>
> Thanks for your help!
>
> Best,
> Xiaowei


--00504502d22b33b11804a36689fc--