incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brandon Williams <dri...@gmail.com>
Subject Re: get_slice() slow if more number of columns present in a SCF.
Date Wed, 03 Feb 2010 14:39:18 GMT
On Wed, Feb 3, 2010 at 4:19 AM, envio user <enviouser@gmail.com> wrote:
>
> After this I tried with 1 million keys:
>
> /home/sun>python stress.py -n 1000000 -t 100 -c 25 -r -o read -i 10
> WARNING: multiprocessing not present, threading will be used.
>        Benchmark may not be accurate!
> total,interval_op_rate,avg_latency,elapsed_time
> .......................
> .......................
> 87916,76,1.30730833113,1240
> 88665,74,1.33158908508,1250
> 89405,74,1.35333179654,1260
> 90086,68,1.45503252228,1270
> 90745,65,1.51978417774,1280
> 91476,73,1.38719448671,1290
> 92226,75,1.3288515962,1300
> 92976,75,1.33220300897,1310
> 93770,79,1.26187492288,1320
> 94557,78,1.26394684554,1330
>

1M rows means you've stored 600M columns, which is around 32G of data after
compaction.  With 8G of memory for disk cache from the OS, minus at least 1G
going to Cassandra's JVM, your machine is going to have to do a disk seek on
at least nearly 80% of your reads.  You can improve this with either more
memory per node, or more nodes, but it's worth noting that ~1875 columns/sec
isn't too bad for this situation, and you can probably read all 600 columns
per row at nearly the same speed.

-Brandon

Mime
View raw message