Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB4E988E7 for ; Thu, 1 Sep 2011 16:17:27 +0000 (UTC) Received: (qmail 72465 invoked by uid 500); 1 Sep 2011 16:17:25 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 72404 invoked by uid 500); 1 Sep 2011 16:17:25 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 72396 invoked by uid 99); 1 Sep 2011 16:17:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Sep 2011 16:17:24 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Sep 2011 16:17:19 +0000 Received: by wyg8 with SMTP id 8so1702179wyg.31 for ; Thu, 01 Sep 2011 09:16:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.60.140 with SMTP id p12mr31688wbh.30.1314893815718; Thu, 01 Sep 2011 09:16:55 -0700 (PDT) Received: by 10.227.55.72 with HTTP; Thu, 1 Sep 2011 09:16:55 -0700 (PDT) X-Originating-IP: [173.167.104.65] In-Reply-To: <4f07cbe2b4b749c7b1d462f9bf27a00d@HUB021-CA-6.exch021.domain.local> References: <4f07cbe2b4b749c7b1d462f9bf27a00d@HUB021-CA-6.exch021.domain.local> Date: Thu, 1 Sep 2011 09:16:55 -0700 Message-ID: Subject: Re: 15 seconds to increment 17k keys? From: Ian Danforth To: "user@cassandra.apache.org" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Does this scale with multiples of the replication factor or directly with number of nodes? Or more succinctly, to double the writes per second into the cluster how many more nodes would I need? (Thanks for the note on pycassa, I've checked and it's not the limiting factor) Ian On Thu, Sep 1, 2011 at 3:36 AM, Richard Low wrote: > Assuming you have replicate_on_write enabled (which you almost > certainly do for counters), you have to do a read on a write for each > increment. =A0This means counter increments, even if all your data set > fits in cache, are significantly slower than normal column inserts. =A0I > would say ~1k increments per second is about right, although you can > probably do some tuning to improve this. > > I've also found that the pycassa client uses significant amounts of > CPU, so be careful you are not CPU bound on the client. > > -- > Richard Low > Acunu | http://www.acunu.com | @acunu > > On Thu, Sep 1, 2011 at 2:31 AM, Yang wrote: >> 1ms per add operation is the general order of magnitude I have seen with= my >> tests. >> >> >> On Wed, Aug 31, 2011 at 6:04 PM, Ian Danforth wr= ote: >>> >>> All, >>> >>> =A0I've got a 4 node cluster (ec2 m1.large instances, replication =3D 3= ) >>> that has one primary counter type column family, that has one column >>> in the family. There are millions of rows. Each operation consists of >>> doing a batch_insert through pycassa, which increments ~17k keys. A >>> majority of these keys are new in each batch. >>> >>> =A0Each operation is taking up to 15 seconds. For our system this is a >>> significant bottleneck. >>> >>> =A0Does anyone know if this write speed is expected? >>> >>> Thanks in advance, >>> >>> =A0Ian >> >> >