incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arya Goudarzi <agouda...@gaiaonline.com>
Subject Re: Strage Read Perfoamnce 1xN column slice or N column slice
Date Wed, 09 Jun 2010 23:51:18 GMT
Hi Jonathan,

This issue persists. I have prepared a code sample which you can use to reproduce what I am
saying. Please see attached. It is using Thrift PHP libraries straight. I am running Cassandra
0.7 build from May 28th. I have tried this on a single host with replication factor 1 and
3 node cluster with replication factor 3. The results remains similar:

100 Sequential Writes took: 0.60781407356262 seconds;
100 Sequential Reads took: 0.23204588890076 seconds;
100 Batch Read took: 0.76512885093689 seconds;

Please advice.

Thank You,
-Arya

----- Original Message -----
From: "Jonathan Ellis" <jbellis@gmail.com>
To: user@cassandra.apache.org
Sent: Monday, June 7, 2010 7:26:30 PM
Subject: Re: Strage Read Perfoamnce 1xN column slice or N column slice

That would be surprising (and it is not what you said in the first
message). I suspect something is wrong with your test methodology.

On Mon, Jun 7, 2010 at 11:23 AM, Arya Goudarzi
<agoudarzi@gaiaonline.com> wrote:
> But I am not comparing reading 1 column vs 100 columns. I am comparing
> reading of 100 columns in loop iterations (100 consecutive calls) vs
> reading all 100 in batch in one call. Doing the loop is faster than
> doing the batch call. Are you saying this is not surprising?
>
> ----- Original Message -----
> From: "Jonathan Ellis" <jbellis@gmail.com>
> To: user@cassandra.apache.org
> Sent: Saturday, June 5, 2010 6:26:46 AM
> Subject: Re: Strage Read Perfoamnce 1xN column slice or N column slice
>
> reading 1 column, is faster than reading lots of columns. this
> shouldn't be surprising.
>
> On Fri, Jun 4, 2010 at 3:52 PM, Arya Goudarzi
> <agoudarzi@gaiaonline.com>
> wrote:
>> Hi Fellows,
>>
>> I have the following design for a system which holds basically
>> key->value pairs (aka Columns) for each user (SuperColumn Key) in
>> different namespaces
>> (SuperColumnFamily row key).
>>
>> Like this:
>>
>> Namesapce->user->column_name = column_value;
>>
>> keyspaces:
>>     - name: NKVP
>>       replica_placement_strategy:
>> org.apache.cassandra.locator.RackUnawareStrategy
>>       replication_factor: 3
>>       column_families:
>>         - name: Namespaces
>>           column_type: Super
>>           compare_with: BytesType
>>           compare_subcolumns_with: BytesType
>>           rows_cached: 20000
>>           keys_cached: 100
>>
>> Cluster using random partitioner.
>>
>> I use multiget_slice() for fetching 1 or many columns inside the
>> child supercolumn at the same time. This is an awkward performance
>> result I
>> get:
>>
>> 100 sequential reads completed in : 0.383 this uses multiget_slice()
>> with 1 key, and 1 column name inside the predicate->column_names
>> 100 batch loaded completed in : 0.786 this uses multiget_slice() with
>> 1 key, and multiple column names inside the predicate->column_names
>>
>> read/write consistency are ONE.
>>
>> Questions:
>>
>> Why doing 100 sequential reads is faster than doing 100 in batch?
>> Is this a good design for my problem?
>> Does my issue relate to
>> https://issues.apache.org/jira/browse/CASSANDRA-598?
>>
>> Now on a single node with replication factor 1 I get this:
>>
>> 100 sequential reads completed in : 0.438
>> 100 batch loaded completed in : 0.800
>>
>> Please advice as to why is this happening?
>>
>> These nodes are VMs. 1 CPU and 1 Gb.
>>
>> Best Regards,
>> =Arya
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
> -- Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>



-- Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message