incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Winter <>
Subject performance problems on new cluster
Date Thu, 11 Aug 2011 15:16:36 GMT

I have recently been migrating to a small 12 node Cassandra cluster 
spanning across 4 DC's and have been encountering various issues with 
what I suspect to be a performance tuning issue with my data set.  I've 
learnt a few lessons along the way but I'm at a bit of a roadblock now 
where I have been experiencing frequent OutOfMemory exceptions, various 
other exceptions, poor performance and my ring is appearing to become 
imbalanced during repairs.  I've tried various different configurations 
but haven't been able to get to the bottom of my performance issues.  
I'm assuming this has something to do with my data and some performance 
tuning metric that I'm merely overlooking.

My ring was created as documented in the wiki & various other 
performance tuning guides, calculating the tokens at each DC and 
incrementing when in conflict.  It is as follows:

Address         DC          Rack        Status State   Load            
Owns    Token
dc1host1  dc1          1a          Up     Normal  88.62 GB        33.33%  0
dc2host1  dc2          1           Up     Normal  14.76 GB        0.00%   1
dc3host1    dc3          1           Up     Normal  15.99 GB        
0.00%   2
dc4host1    cd4          1           Up     Normal  14.52 GB        
0.00%   3
dc1host2   dc1          1a          Up     Normal  18.02 GB        
33.33%  56713727820156410577229101238628035242
dc2host2  dc2          1           Up     Normal  16.5 GB         
0.00%   56713727820156410577229101238628035243
dc3host2     dc3          1           Up     Normal  16.37 GB        
0.00%   56713727820156410577229101238628035244
dc4host2    dc4          1           Up     Normal  13.34 GB        
0.00%   56713727820156410577229101238628035245
dc1host3  dc1          1a          Up     Normal  16.59 GB        
33.33%  113427455640312821154458202477256070484
dc2host3   dc2          1           Up     Normal  15.22 GB        
0.00%   113427455640312821154458202477256070485
dc3host3   dc3          1           Up     Normal  15.59 GB        
0.00%   113427455640312821154458202477256070486
dc4host3    dc4          1           Up     Normal  8.84 GB         
0.00%   113427455640312821154458202477256070487

The above ring was freshly created and fairly evenly distributed in load 
prior to a repair (which is still running at the time of the above 
command) on dc1host1, however with the exception of dc4host3 where a 
previous bulk data load timed out.  dc4host3 was responding poorly, was 
failing according to other nodes and judging from its heap usage was 
rather close to OOM'ing before it was restarted.

I'm also using NTS with RF2.

The primary issues I'm experiencing are:

Light load against nodes in dc1 was causing OutOfMemory exceptions 
across all Cassandra servers outside of dc1 which were all idle and 
eventually after several hours happened on one of the dc1 nodes.  This 
issue was produced using svn trunk r1153002 and an in house written 
Snitched which effectively combined PropertyFileSnitch with some 
components of Ec2Snitch.  While trying to resolve these issues I have 
moved to a r1156490 snapshot and have switched across to just the 
PropertyFileSnitch and simply utilising the broadcast_address 
configuration option available in trunk which seems to work quite well.

Since moving to r1156490 we have stopped getting OOM's, but that may 
actually be because we have been unable to send traffic to the cluster 
to be able to produce one.

The most current issues I have been experiencing are the following:

1) thrift timeouts & general degraded response times
2) *lots* of exception errors, such as:

ERROR [ReadRepairStage:1076] 2011-08-11 13:33:41,266 (line 133) Fatal exception in thread 

3) ring imbalances during a repair (refer to the above nodetool ring output)
4) regular failure detection when any node does something only 
moderately stressful, such as a repair or are under light load etc. but 
the node itself thinks it is fine.

My hosts are all 32Gb with either 4 or 16 cores, I've set heaps 
appropriately to half physical memory (16G) and for the purpose of 
cluster simplicity set all younggen to 400Mb.  JNA is in use, commitlogs 
and data have been split onto different filesystems and so on.

My data set as described by a dev is essentially as follows:

3 column families (tables):

cf1.  The RowKey is the user id.  This is the primary column family 
queried on and always just looked up by RowKey.  It has 1 supercolumn 
called "seg".  The column names in this supercolumn are the segment_id's 
that the user belongs to and the value is just "1".  This should have 
about 150mm rows.  Each row will have an average of 2-3 columns in the 
"seg" supercolumn.  The column values have TTL's set on them.

cf2.  This is a CounterColumnFamily.  There's only a single "cnt" column 
which stores a counter of the number of cf1's having that segment.  This 
was only updated during the import and is not read at all.

cf3.  This is a lookup between the RowKey which is an external ID and 
the RowKey to be used to find the user in the cf1  CF.

Does anyone have any ideas or suggestions about where I should be 
focusing on to get to the bottom of these issues or any recommendations 
on where I should be focusing my efforts on?


View raw message