incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Why is row lookup much faster than column lookup
Date Wed, 14 Mar 2012 07:55:34 GMT
Here is a look at query plans 
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

tl;dr - wide rows require in index to be read from disk; the fastest query uses no start and
no finish.  

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/03/2012, at 6:58 AM, Dave Brosius wrote:

> 
> sorry, should have been: Given the hashtable nature of cassandra, finding a row is probably
'relatively' constant no matter how many *rows* you have.
> 
> 
> ----- Original Message -----
> From: "Dave Brosius" <dbrosius@mebigfatguy.com> 
> Sent: Tue, March 13, 2012 13:43
> Subject: Re: Why is row lookup much faster than column lookup
> 
> < div clas s="PrivateMsgDiv"> Given the hashtable nature of cassandra, finding
a row is probably 'relatively' constant no matter how many columns you have.
> 
> The smaller the number of columns, i suppose the more likely that all the columns will
be in one sstable. If you've got a ton of columns per row, it is much more likely that these
columns will be spread out in multple ss tables. Plus, columns are read in chunks, depending
on yaml settings.
> 
> 
> ----- Original Message -----
> From: "A J" <s5alye@gmail.com> 
> Sent: Tue, March 13, 2012 13:35
> Subject: Why is row lookup much faster than column lookup
> 
> From my tests, I am seeing that a CF that has less than 100 columns
> but millions of rows has a much lower latency to read a column in a
> row than a CF that has only a few thousands of rows but wide rows with
> each having 20K columns.
> 
> Example:
> cf1 has 6 Million rows and each row has about 100 columns.
> t1 = time.time()
> cf1.get(1234,column_count=1)
> t2 = time.time() - t1
> print int(t2*1000)
> takes 3 ms
> 
> cf2 has 5K rows and each row has about 18K columns.
> t1 = time.time()
> cf2.get(1234,column_count=1)
> t2 = time.time() - t1
> print int(t2*1000)
> takes 82ms
> 
> Anything in general on the Cassandra architecture that causes row
> lookup to be much faster than column lookup ?
> 
> Thanks.


Mime
View raw message