incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brandon Williams <dri...@gmail.com>
Subject Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'
Date Tue, 09 Mar 2010 20:13:50 GMT
On Tue, Mar 9, 2010 at 1:14 PM, Sylvain Lebresne <sylvain@yakaz.com> wrote:

> I've inserted 1000 row of 100 column each (python stress.py -t 2 -n
> 1000 -c 100 -i 5)
> If I read, I get the roughly the same number of row whether I read the
> whole row
> (python stress.py -t 10 -n 1000 -o read -r -c 100) or only the first column
> (python stress.py -t 10 -n 1000 -o read -r -c 1). And that's less that
> 10 rows by
> seconds.
>
> So sure, when I read the whole row, that almost 1000 columns by
> seconds, which is
> roughly 50M/s troughput, which is quite good. But when I read only the
> first column,
> I get 10 columns by seconds, that 500K/s, which is less good. Now,
> from what I've
> understood so far, cassandra doesn't deserialize whole row to read a
> single column
> (I'm not using supercolumn here), so I don't understand those numbers.
>

A row causes a disk seek while columns are contiguous.  So if the row isn't
in the cache, you're being impaired by the seeks.  In general, fatter rows
should be more performant than skinny ones.

-Brandon

Mime
View raw message