cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Hawthorne <dha...@gmx.3crowd.com>
Subject randompartitioner cluster unbalanced
Date Thu, 01 Sep 2011 04:28:56 GMT
$ ./nodetool -h localhost ring
Address         DC          Rack        Status State   Load            Owns    Token     
                                 
                                                                            136112946768375385385349842972707284580
    
10.0.0.57    datacenter1 rack1       Up     Normal  8.31 GB         20.00%  0            
                              
10.0.0.56    datacenter1 rack1       Up     Normal  13.7 GB         20.00%  34028236692093846346337460743176821145
     
10.0.0.55    datacenter1 rack1       Up     Normal  13.87 GB        20.00%  68056473384187692692674921486353642290
     
10.0.0.54    datacenter1 rack1       Up     Normal  8.03 GB         20.00%  102084710076281539039012382229530463435
    
10.0.0.72    datacenter1 rack1       Up     Normal  1.77 GB         20.00%  136112946768375385385349842972707284580
    


This is a brand new cluster we just brought up and started loading data into a few days ago.
 It's using the RandomPartitioner, RF=3 on everything, and we're doing QUORUM writes.  All
keyspaces and CFs are for counter super columns.  All keys are moderately sized ascii strings
with good variation between them.  All supercolumn names are longs.  All column names are
ascii strings.  No decrements are done, no rows or columns are deleted, and read load is almost
nonexistent.  Column values may get overwritten on account of being incremented because they
are counters.  This is expected to happen quite a bit.  Not all rows are the same length.

Insert latency from my hector client box to the cluster averages at 70ms - 200ms, which is
really high.  Inserts/sec from hector's perspective peaks out at 750/sec, and consistently
drops down (and stays at) 120/sec.  This is not due to compactions, based on the output of
nodetool compactionstats.

I wiped the cluster this afternoon, started from scratch, and I'm seeing the same distribution
on a smaller scale, with the same latencies.  Inserts

Going by statistics from cassandra via jmx, it looks like all hosts are getting about the
same number of MutationStage Completed Tasks/sec.  However, I see one host consistently has
Pending MutationStage and ReplicateOnWriteStage tasks (50/30 respectively - 211/42 respectively,
throughout the day).  Now, I know that ReplicateOnWrite can go really slow if you have large
SuperColumns, but I do not.  I'm working on proving that at the moment, pending a couple of
code pushes.  This same box typically runs CPU up around 600-700%, and it's all user space
cpu, not IO wait.  We monitor these boxes like crazy, and we've tweaked it a bit to try to
rule things out (enabling mmap'd io, disabled swap, mounted ext4 with noatime), none of which
has made a single bit of difference.  If I kill cassandra on that one box, the load moves
to the box before it in the ring, ruling out this one box as bad hardware, etc.  Mutations
and ROWs back up, and cpu jumps to 600%.  Heap memory usage sits at 600MB-2GB and heap size
is 4G on all 5 boxes.

CPU usage and Mutations/ROWs are not affected by hector client connections;  if I remove this
single host from the hector configuration and confirm that there are 0 connections from my
client to this one box, I still see high Mutations and ROWs and CPU usage.  If I increase
the number of client connections in the hector pool, performance does not change.

concurrent_writes are set at 48, concurrent_reads at 32, num cores per box is 8.  memtable
flush size in mb is 28MB and flush based on ops is 131k.  Our memtables flush every 3 minutes
(based on graphs, and this aligns exactly with the 131k / (Mutations/sec each box is doing)).
 commitlog and data are on the same disk, but our disks seem bored.

key cache is enabled and I see an almost perfect 100% hit rate.  row cache is disabled.

My questions are:

is this normal to see load unevenly spread out when using RandomPartitioner?
how do I fix it?  Do I need to assign token ranges manually even with RandomPartitioner?
is there a way to see the total row counts assigned to each box?
why is this one host running 600% cpu while the rest are sitting at 0%?


For reference, here's cfstats taken from the host with the high cpu usage.

Keyspace: STATS_TEST
	Read Count: 18744838
	Read Latency: 2.568355930309987 ms.
	Write Count: 18744845
	Write Latency: 0.020453476835898085 ms.
	Pending Tasks: 0
		Column Family: rollup1h
		SSTable count: 4
		Space used (live): 194724367
		Space used (total): 260574143
		Number of Keys (estimate): 11904
		Memtable Columns Count: 34708
		Memtable Data Size: 27280700
		Memtable Switch Count: 67
		Read Count: 9255646
		Read Latency: 2.498 ms.
		Write Count: 9255658
		Write Latency: 0.021 ms.
		Pending Tasks: 0
		Key cache capacity: 200000
		Key cache size: 91254
		Key cache hit rate: 0.9950598390225411
		Row cache: disabled
		Compacted row minimum size: 150
		Compacted row maximum size: 52066354
		Compacted row mean size: 17404

		Column Family: rollup5m
		SSTable count: 4
		Space used (live): 296161119
		Space used (total): 402687415
		Number of Keys (estimate): 10496
		Memtable Columns Count: 34742
		Memtable Data Size: 34607575
		Memtable Switch Count: 67
		Read Count: 9255681
		Read Latency: 2.700 ms.
		Write Count: 9255687
		Write Latency: 0.020 ms.
		Pending Tasks: 0
		Key cache capacity: 200000
		Key cache size: 88629
		Key cache hit rate: 0.9956045403129263
		Row cache: disabled
		Compacted row minimum size: 150
		Compacted row maximum size: 129557750
		Compacted row mean size: 25562


Mime
View raw message