incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Araujo <cassandra-us...@alex.otherinbox.com>
Subject Re: Ec2 Stress Results
Date Mon, 09 May 2011 22:58:14 GMT
On 5/6/11 9:47 PM, Jonathan Ellis wrote:
> On Fri, May 6, 2011 at 5:13 PM, Alex Araujo
> <cassandra-users@alex.otherinbox.com>  wrote:
>> I raised the default MAX_HEAP setting from the AMI to 12GB (~80% of
>> available memory).
> This is going to make GC pauses larger for no good reason.
Good point - only doing writes at the moment.  I will revert the change 
and raise this conservatively once I add reads to the mix.

>> raised
>> concurrent_writes to 300 based on a (perhaps arbitrary?) recommendation in
>> 'Cassandra: The Definitive Guide'
> That's never been a good recommendation.
It seemed to contradict the '8 * number of cores' rule of thumb.  I set 
that back to the default of 32.

>> Based on the above, would I be correct in assuming that frequent memtable
>> flushes and/or commitlog I/O are the likely bottlenecks?
> Did I miss where you said what CPU usage was?
I observed a consistent 200-350% initially; 300-380% once 'hot' for all 
runs.  Here is an average case sample:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
15108 cassandr  20   0 5406m 4.5g  15m S  331 30.4  89:32.50 jsvc

> How many replicas are you writing?

Replication factor is 3.

> Recent testing suggests that putting the commitlog on the raid0 volume
> is better than on the root volume on ec2, since the root isn't really
> a separate device.
>
I migrated the commitlog to the raid0 volume and retested with the above 
changes.  I/O appeared more consistent in iostat.  Here's an average 
case (%util in the teens):

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           36.84    4.05   13.97    3.04   18.42   23.68

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s 
avgrq-sz avgqu-sz   await  svctm  %util
xvdap1            0.00     0.00    0.00    0.00     0.00     0.00     
0.00     0.00    0.00   0.00   0.00
xvdb              0.00     0.00    0.00  222.00     0.00 18944.00    
85.33    13.80   62.16   0.59  13.00
xvdc              0.00     0.00    0.00  231.00     0.00 19480.00    
84.33     5.80   25.11   0.78  18.00
xvdd              0.00     0.00    0.00  228.00     0.00 19456.00    
85.33    17.43   76.45   0.57  13.00
xvde              0.00     0.00    0.00  229.00     0.00 19464.00    
85.00    10.41   45.46   0.44  10.00
md0               0.00     0.00    0.00  910.00     0.00 77344.00    
84.99     0.00    0.00   0.00   0.00

and worst case (%util above 60):

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           44.33    0.00   24.54    0.82   15.46   14.85

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s 
avgrq-sz avgqu-sz   await  svctm  %util
xvdap1            0.00     1.00    0.00    4.00     0.00    40.00    
10.00     0.15   37.50  22.50   9.00
xvdb              0.00     0.00    0.00  427.00     0.00 36440.00    
85.34    54.12  147.85   1.69  72.00
xvdc              0.00     0.00    1.00  295.00     8.00 25072.00    
84.73    34.56   84.32   2.13  63.00
xvdd              0.00     0.00    0.00  355.00     0.00 30296.00    
85.34    94.49  257.61   2.17  77.00
xvde              0.00     0.00    0.00  373.00     0.00 31768.00    
85.17    68.50  189.33   1.88  70.00
md0               0.00     0.00    1.00 1418.00     8.00 120824.00    
85.15     0.00    0.00   0.00   0.00

Overall, results were roughly the same.  The most noticeable difference 
was no timeouts until number of client threads was 350 (previously 200):

+----------+----------+----------+----------+----------+----------+----------+
|  Server  |  Client  | --keep-  | Columns  |  Client  |  Total   | 
Combined |
|  Nodes   |  Nodes   |  going   |          | Threads  | Threads  | Rate 
(wr |
|          |          |          |          |          |          | 
ites/s)  |
+==========+==========+==========+==========+==========+==========+==========+
|    4     |    3     |    N     | 10000000 |   150    |   450    |  
21241   |
+----------+----------+----------+----------+----------+----------+----------+
|    4     |    3     |    N     | 10000000 |   200    |   600    |  
21536   |
+----------+----------+----------+----------+----------+----------+----------+
|    4     |    3     |    N     | 10000000 |   250    |   750    |  
19451   |
+----------+----------+----------+----------+----------+----------+----------+
|    4     |    3     |    N     | 10000000 |   300    |   900    |  
19741   |
+----------+----------+----------+----------+----------+----------+----------+

Those results are after I compiled/deployed the latest cassandra-0.7 
with the patch for 
https://issues.apache.org/jira/browse/CASSANDRA-2578.  Thoughts?



Mime
View raw message