cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From envio user <enviou...@gmail.com>
Subject get_slice() slow if more number of columns present in a SCF.
Date Tue, 02 Feb 2010 15:27:50 GMT
All,

Here are some tests[batch_insert() and get_slice()] I performed on cassandra.

H/W: Single node, Quad Core(8 cores), 8GB RAM:
Two separate physical disks, one for the commit log and another for the data.

storage-conf.xml
================
<KeysCachedFraction>0.4</KeysCachedFraction>
<CommitLogRotationThresholdInMB>256</CommitLogRotationThresholdInMB>
<MemtableSizeInMB>128</MemtableSizeInMB>
<MemtableObjectCountInMillions>0.2</MemtableObjectCountInMillions>
<MemtableFlushAfterMinutes>1440</MemtableFlushAfterMinutes>
<ConcurrentReads>16</ConcurrentReads>


Data Model:

<ColumnFamily ColumnType="Super" CompareWith="UTF8Type"
CompareSubcolumnsWith="UTF8Type" Name="Super1" />

TEST1A
======
/home/sun>python stress.py -n 100000 -t 100 -y super -u 1 -c 25 -r -o
insert -i 10
WARNING: multiprocessing not present, threading will be used.
        Benchmark may not be accurate!
total,interval_op_rate,avg_latency,elapsed_time
19039,1903,0.0532085509215,10
52052,3301,0.0302550313445,20
82274,3022,0.0330235137811,30
100000,1772,0.0337765234716,40

TEST1B
=====
/home/sun>python stress.py -n 100000 -t 100 -y super -u 1 -c 25 -r -o read -i 10
WARNING: multiprocessing not present, threading will be used.
        Benchmark may not be accurate!
total,interval_op_rate,avg_latency,elapsed_time
16472,1647,0.0615632034523,10
39375,2290,0.04384300123,20
65259,2588,0.0385473697268,30
91613,2635,0.0379411213277,40
100000,838,0.0331208069702,50
/home/sun>


**** I deleted all the data(all: commitlog,data..) and restarted cassandra.***
I am ok with TEST1A and TEST1B. I want to populate the SCF with > 500
columns and read 25 columns per key.

TEST2A
======
/home/sun>python stress.py -n 100000 -t 100 -y super -u 1 -c 600 -r -o
insert -i 10
WARNING: multiprocessing not present, threading will be used.
        Benchmark may not be accurate!
total,interval_op_rate,avg_latency,elapsed_time
.............
.............
84216,144,0.689481827031,570
85768,155,0.625061393859,580
87307,153,0.648041650953,590
88785,147,0.671928719674,600
90488,170,0.611753724284,610
91983,149,0.677673689896,620
93490,150,0.63891824366,630
95017,152,0.65472143182,640
96612,159,0.64355712789,650
98098,148,0.673311280851,660
99622,152,0.486848112166,670
100000,37,0.174115514629,680

I understand nobody will write 600 columns at a time. I just need to
populate the data, hence I did this test.

[root@fc10mc1 ~]# ls -l /var/lib/cassandra/commitlog/
total 373880
-rw-r--r-- 1 root root 268462742 2010-02-03 02:00 CommitLog-1265141714717.log
-rw-r--r-- 1 root root 114003919 2010-02-03 02:00 CommitLog-1265142593543.log

[root@fc10mc1 ~]# ls -l /cassandra/lib/cassandra/data/Keyspace1/
total 3024232
-rw-r--r-- 1 root root 1508524822 2010-02-03 02:00 Super1-192-Data.db
-rw-r--r-- 1 root root      92725 2010-02-03 02:00 Super1-192-Filter.db
-rw-r--r-- 1 root root    2639957 2010-02-03 02:00 Super1-192-Index.db
-rw-r--r-- 1 root root  100838971 2010-02-03 02:02 Super1-279-Data.db
-rw-r--r-- 1 root root       8725 2010-02-03 02:02 Super1-279-Filter.db
-rw-r--r-- 1 root root     176481 2010-02-03 02:02 Super1-279-Index.db
-rw-r--r-- 1 root root 1478775337 2010-02-03 02:03 Super1-280-Data.db
-rw-r--r-- 1 root root      90805 2010-02-03 02:03 Super1-280-Filter.db
-rw-r--r-- 1 root root    2588072 2010-02-03 02:03 Super1-280-Index.db
[root@fc10mc1 ~]#

[root@fc10mc1 ~]# du -hs /cassandra/lib/cassandra/data/Keyspace1/
2.9G    /cassandra/lib/cassandra/data/Keyspace1/


TEST2B
======

/home/sun>python stress.py -n 100000 -t 100 -y super -u 1 -c 25 -r -o read -i 10
WARNING: multiprocessing not present, threading will be used.
        Benchmark may not be accurate!
total,interval_op_rate,avg_latency,elapsed_time
.................
................

66962,382,0.261044957001,180
70598,363,0.276139952824,190
74490,389,0.25678327989,200
78252,376,0.263047518976,210
82031,377,0.266485546846,220
86008,397,0.248498579411,230
89699,369,0.274926948857,240
93590,389,0.256867142883,250
97328,373,0.267352432985,260
100000,267,0.217604277555,270

This test is more worrying for us. We can't even read 1000 reads per
second. Is there any limitation on cassandra, which will not work with
more number of columns ?.  Or mm I doing something wrong here?. Please
let me know.

Attached are the nodeprobe(tpstats), iostats, and vmstats taken for the tests.


thanks in advance,
-Aita


some changes I made to stress.py to accomadate more columns.

157c156
<         columns = [Column('A' + str(j), data, 0) for j in
xrange(columns_per_key)]
---
>         columns = [Column(chr(ord('A') + j), data, 0) for j in xrange(columns_per_key)]
159c158
<             supers = [SuperColumn('A' + str(j), columns) for j in
xrange(supers_per_key)]
---
>             supers = [SuperColumn(chr(ord('A') + j), columns) for j in xrange(supers_per_key)]
187c186
<                     parent = ColumnParent('Super1', 'A' + str(j))
---
>                     parent = ColumnParent('Super1', chr(ord('A') + j))

Mime
View raw message