Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 30889 invoked from network); 5 Jun 2010 13:27:34 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 Jun 2010 13:27:34 -0000 Received: (qmail 80907 invoked by uid 500); 5 Jun 2010 13:27:33 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 80878 invoked by uid 500); 5 Jun 2010 13:27:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 80870 invoked by uid 99); 5 Jun 2010 13:27:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Jun 2010 13:27:33 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=10.0 tests=AWL,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Jun 2010 13:27:28 +0000 Received: by wyf23 with SMTP id 23so1505002wyf.31 for ; Sat, 05 Jun 2010 06:27:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=Viu+nHFJ2va8SgbX40WpMIIbw1+JZtp/rq+3mkt9bfA=; b=rTeIkDJWvqXen+2a+ziWxjjVE9iCGWtk1+RL/xRv9XkgQnSj19CqWZHD8lHYpYVXPm VGr6mLDhZw4jYpSG6WcCfMsENVtTBBjiZU0tfOZH1c0XcYRysg3fsKGnV2SRLioYd1AA 9dwRMUEd50wZG58DWKmrPlLU9YpkhBkspj/mo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=gMGeDPqz45npQbSkVA6im7748vqCZeEPZ+5QRXjNRi0GBevE6sZgs3fN//TaqAvbQE n1v2qEZ4vZhH9oA2w5amH5r9eGWRuUcqDoId//jTxipuuUQIR73MB/XqVv7VgQ8CGjFR N/QUZWaIykc8ic/NCrbAJFmO788li9nRuNOPo= Received: by 10.216.174.209 with SMTP id x59mr553003wel.97.1275744426265; Sat, 05 Jun 2010 06:27:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.17.197 with HTTP; Sat, 5 Jun 2010 06:26:46 -0700 (PDT) In-Reply-To: <6554552.421275691935028.JavaMail.arya@aryanet> References: <7266333.401275691890478.JavaMail.arya@aryanet> <6554552.421275691935028.JavaMail.arya@aryanet> From: Jonathan Ellis Date: Sat, 5 Jun 2010 06:26:46 -0700 Message-ID: Subject: Re: Strage Read Perfoamnce 1xN column slice or N column slice To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable reading 1 column, is faster than reading lots of columns. this shouldn't be surprising. On Fri, Jun 4, 2010 at 3:52 PM, Arya Goudarzi wr= ote: > Hi Fellows, > > I have the following design for a system which holds basically key->value > pairs (aka Columns) for each user (SuperColumn Key) in different namespac= es > (SuperColumnFamily row key). > > Like this: > > Namesapce->user->column_name =3D column_value; > > keyspaces: > =A0=A0=A0 - name: NKVP > =A0=A0=A0=A0=A0 replica_placement_strategy: > org.apache.cassandra.locator.RackUnawareStrategy > =A0=A0=A0=A0=A0 replication_factor: 3 > =A0=A0=A0=A0=A0 column_families: > =A0=A0=A0=A0=A0=A0=A0 - name: Namespaces > =A0=A0=A0=A0=A0=A0=A0=A0=A0 column_type: Super > =A0=A0=A0=A0=A0=A0=A0=A0=A0 compare_with: BytesType > =A0=A0=A0=A0=A0=A0=A0=A0=A0 compare_subcolumns_with: BytesType > =A0=A0=A0 =A0=A0=A0 =A0 rows_cached: 20000 > =A0=A0=A0 =A0=A0=A0 =A0 keys_cached: 100 > > Cluster using random partitioner. > > I use multiget_slice() for fetching 1 or many columns inside the child > supercolumn at the same time. This is an awkward performance result I get= : > > 100 sequential reads completed in : 0.383=A0=A0=A0 this uses multiget_sli= ce() with > 1 key, and 1 column name inside the predicate->column_names > 100 batch loaded completed in : 0.786=A0=A0=A0=A0 this uses multiget_slic= e() with 1 > key, and multiple column names inside the predicate->column_names > > read/write consistency are ONE. > > Questions: > > Why doing 100 sequential reads is faster than doing 100 in batch? > Is this a good design for my problem? > Does my issue relate to https://issues.apache.org/jira/browse/CASSANDRA-5= 98? > > Now on a single node with replication factor 1 I get this: > > 100 sequential reads completed in : 0.438 > 100 batch loaded completed in : 0.800 > > Please advice as to why is this happening? > > These nodes are VMs. 1 CPU and 1 Gb. > > Best Regards, > =3DArya > > > > > > > > --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com