Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates
 74.125.82.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=gMGeDPqz45npQbSkVA6im7748vqCZeEPZ+5QRXjNRi0GBevE6sZgs3fN//TaqAvbQE
         n1v2qEZ4vZhH9oA2w5amH5r9eGWRuUcqDoId//jTxipuuUQIR73MB/XqVv7VgQ8CGjFR
         N/QUZWaIykc8ic/NCrbAJFmO788li9nRuNOPo=
MIME-Version: 1.0
In-Reply-To: <6554552.421275691935028.JavaMail.arya@aryanet>
References: <7266333.401275691890478.JavaMail.arya@aryanet>
	<6554552.421275691935028.JavaMail.arya@aryanet>
From: Jonathan Ellis <jbellis@gmail.com>
Date: Sat, 5 Jun 2010 06:26:46 -0700
Message-ID: <AANLkTikIl2p2foBKgmayAGe1YYkJbZVpKBoCeBfqyE-e@mail.gmail.com>
Subject: Re: Strage Read Perfoamnce 1xN column slice or N column slice
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

reading 1 column, is faster than reading lots of columns.  this
shouldn't be surprising.

On Fri, Jun 4, 2010 at 3:52 PM, Arya Goudarzi <agoudarzi@gaiaonline.com> wr=
ote:
> Hi Fellows,
>
> I have the following design for a system which holds basically key->value
> pairs (aka Columns) for each user (SuperColumn Key) in different namespac=
es
> (SuperColumnFamily row key).
>
> Like this:
>
> Namesapce->user->column_name =3D column_value;
>
> keyspaces:
> =A0=A0=A0 - name: NKVP
> =A0=A0=A0=A0=A0 replica_placement_strategy:
> org.apache.cassandra.locator.RackUnawareStrategy
> =A0=A0=A0=A0=A0 replication_factor: 3
> =A0=A0=A0=A0=A0 column_families:
> =A0=A0=A0=A0=A0=A0=A0 - name: Namespaces
> =A0=A0=A0=A0=A0=A0=A0=A0=A0 column_type: Super
> =A0=A0=A0=A0=A0=A0=A0=A0=A0 compare_with: BytesType
> =A0=A0=A0=A0=A0=A0=A0=A0=A0 compare_subcolumns_with: BytesType
> =A0=A0=A0 =A0=A0=A0 =A0 rows_cached: 20000
> =A0=A0=A0 =A0=A0=A0 =A0 keys_cached: 100
>
> Cluster using random partitioner.
>
> I use multiget_slice() for fetching 1 or many columns inside the child
> supercolumn at the same time. This is an awkward performance result I get=
:
>
> 100 sequential reads completed in : 0.383=A0=A0=A0 this uses multiget_sli=
ce() with
> 1 key, and 1 column name inside the predicate->column_names
> 100 batch loaded completed in : 0.786=A0=A0=A0=A0 this uses multiget_slic=
e() with 1
> key, and multiple column names inside the predicate->column_names
>
> read/write consistency are ONE.
>
> Questions:
>
> Why doing 100 sequential reads is faster than doing 100 in batch?
> Is this a good design for my problem?
> Does my issue relate to https://issues.apache.org/jira/browse/CASSANDRA-5=
98?
>
> Now on a single node with replication factor 1 I get this:
>
> 100 sequential reads completed in : 0.438
> 100 batch loaded completed in : 0.800
>
> Please advice as to why is this happening?
>
> These nodes are VMs. 1 CPU and 1 Gb.
>
> Best Regards,
> =3DArya
>
>
>
>
>
>
>
>


--=20
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com