Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 40C907AF7 for ; Tue, 6 Sep 2011 06:02:05 +0000 (UTC) Received: (qmail 50029 invoked by uid 500); 6 Sep 2011 06:02:03 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 49474 invoked by uid 500); 6 Sep 2011 06:01:50 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 49459 invoked by uid 99); 6 Sep 2011 06:01:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Sep 2011 06:01:44 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gcdcu-cassandra-user-1@m.gmane.org designates 80.91.229.12 as permitted sender) Received: from [80.91.229.12] (HELO lo.gmane.org) (80.91.229.12) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Sep 2011 06:01:37 +0000 Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1R0oih-0000K6-3G for user@cassandra.apache.org; Tue, 06 Sep 2011 08:01:15 +0200 Received: from rev-89-111-19-52.deac.net ([rev-89-111-19-52.deac.net]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 06 Sep 2011 08:01:15 +0200 Received: from oleganas by rev-89-111-19-52.deac.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 06 Sep 2011 08:01:15 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: user@cassandra.apache.org From: Oleg Anastastasyev Subject: Re: 15 seconds to increment 17k keys? Date: Tue, 6 Sep 2011 06:01:02 +0000 (UTC) Lines: 19 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: sea.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 89.111.19.52 (Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:6.0.1) Gecko/20100101 Firefox/6.0.1) X-Virus-Checked: Checked by ClamAV on apache.org > in the family. There are millions of rows. Each operation consists of > doing a batch_insert through pycassa, which increments ~17k keys. A > majority of these keys are new in each batch. > > Each operation is taking up to 15 seconds. For our system this is a > significant bottleneck. > Try to split your batch to smaller pieces and launch them in parallel. This way you may get better performance, because all cores are employed and there will be less copying/rebuilding of large structures inside thrift & cassandra. I found that 1k rows in a batch is behaving better than 10k. It is also a good idea to split batch to slices according to replication strategy and communicate appropriate slice directly to its natural endpoint. This will reduce neccessary intercommunication between nodes.