incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jodylandren...@comcast.net
Subject insert slowdown with secondary indexes
Date Sat, 11 Jun 2011 00:27:15 GMT
Problem: 
I am attempting to compare a data model of SuperColumn family with a normal Column Family
with Secondary Indexes. I did not have insert issues with the SuperColumn family. The problem
I am having seems to be inserting into the Column Family with indexes. Seems to be very slow
and getting slower. Also, seems like from some previous test, I did not have issue with the
normal column family without indexes. About 24hrs after I started the inserts it is taking
7x longer to do the same size insert.  Progressively getting slower and slower.

Cluster config: 
I am using cassandra 0.7.6 for a test on a 4 node cluster with replication set at 2. The nodes
are 32-bit, quad-core, Linux, 4GB ram, single hard drive. 
Some settings: 
MAX_HEAP_SIZE="2000m" 
HEAP_NEWSIZE="400m" 
memtable_flush_queue_size: 10 (was 4) 
Everything else is pretty much default - Random partitioner, etc. 


What I am seeing: 
On one machine in particular, it seems to have a bit of IO contention and waits. The other
machines don't exhibit this problem. 
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
..
 1 32  79384 110272  23900 1477272    0    0  3972     4 2927 1659 47  2  0 51
 3 31  79384 110520  23892 1476788    0    0  4420     0 1723  622 52  2  0 46
 4 29  79384 111512  23892 1475788    0    0  3876     0 1579  576 53  2  0 44

the other machines look like
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0  96120 1032600  13100 581160    0    0    72     8 1598 1325 30  4 65  0
 1  0  96120 1029252  13108 584224    0    0     0   144  609  155 23  2 75  0
 1  0  96120 1027012  13108 587308    0    0    68     0 3437 6890 37  6 57  0

doing an iostat -x on the machine that is bogged down from an io standpoint

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await
 svctm  %util
sda               0.00     0.00  364.00    0.00  8264.00     0.00    22.70   109.95  149.60
  2.75 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          36.86    0.00    1.23   61.92    0.00    0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await
 svctm  %util
sda               0.00     0.00  326.00    0.00  7832.00     0.00    24.02   118.45  180.10
  3.07 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          40.29    0.00    1.47   58.23    0.00    0.00


****** Additionally, and very strange to me, I see on one machine about 60000+ files representing
the test column family(and growing with the test). This does not seem like it would be normal?
I've shown a few with typical sizes(very small).
mycolumnfam.646f6d61696e4964-f-1943-Data.db              11M
mycolumnfam.646f6d61696e4964-f-1943-Filter.db            40K
mycolumnfam.646f6d61696e4964-f-1943-Index.db             1.3K
mycolumnfam.646f6d61696e4964-f-1943-Statistics.db        4.2K
mycolumnfam-f-993-Index.db
mycolumnfam-f-993-Statistics.db
mycolumnfam-f-994-Data.db
mycolumnfam-f-994-Filter.db
etc, etc. repeating

The test:
I have a process that has only 2 threads that is attempting to load about 300million rows(22GB
of data).  This is using the Hector java client. I am doing batch inserts of 1000 rows at
a time. I am inserting the column values as bytes. The column names are strings. The column
family has a total of 15 columns(each row).  9 of those columns have indexes.

The column family stats while under test look like the following. I note that the key cache
hit rate is very large. I haven't done any reads yet. None of my other families have this.
Column Family: mycolumnfam
		SSTable count: 11
		Space used (live): 13297907918
		Space used (total): 13385402196
		Memtable Columns Count: 287238
		Memtable Data Size: 8778990
		Memtable Switch Count: 1036
		Read Count: 22211625
		Read Latency: 0.347 ms.
		Write Count: 22211625
		Write Latency: 0.026 ms.
		Pending Tasks: 0
		Key cache capacity: 200000
		Key cache size: 6086
		Key cache hit rate: 3.6248105493445186E-5
		Row cache: disabled
		Compacted row minimum size: 447
		Compacted row maximum size: 642
		Compacted row mean size: 634

I'm trying to understand why doing the inserts into a column family with indexes seems to
jam things up and am wondering if there are any settings that I could tweak to help. It seems
that the 4 node cluster should be able to handle 2 threads of data coming at it.  Has anyone
had any experience with this number of indexes per column family? Any insight or suggestions
would be appreciated.

Thanks in advance-- 

Mime
View raw message