cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Araujo <cassandra-us...@alex.otherinbox.com>
Subject Re: Ec2 Stress Results
Date Sat, 07 May 2011 00:13:01 GMT
Pardon the long delay - went on holiday and got sidetracked before I 
could return to this project.

@Joaquin - The DataStax AMI uses a RAID0 configuration on an instance 
store's ephemeral drives.

@Jonathan - you were correct about the client node being the 
bottleneck.  I setup 3 XL client instances to run contrib/stress back on 
the 4 node XL Cassandra cluster and incrementally raised number of 
threads on the clients until I started seeing timeouts.

I set the following mem settings for the client JVMs: -Xms2G -Xmx10G

I raised the default MAX_HEAP setting from the AMI to 12GB (~80% of 
available memory).  I used the default AMI cassandra.yaml settings for 
the Cassandra nodes until timeouts started appearing, and then raised 
concurrent_writes to 300 based on a (perhaps arbitrary?) recommendation 
in 'Cassandra: The Definitive Guide' that recommended raising that 
number based on number of client threads (timeouts started appearing at 
200 threads per client; 600 total threads).  The client nodes were in 
the same AZ as the Cassandra nodes, and I set the --keep-going option on 
the clients for every other run >= 200 threads.

Results
+----------+----------+----------+----------+----------+----------+----------+
|  Server  |  Client  | --keep-  | Columns  |  Client  |  Total   | 
Combined |
|  Nodes   |  Nodes   |  going   |          | Threads  | Threads  |   
Rate   |
+==========+==========+==========+==========+==========+==========+==========+
|    4     |    3     |    N     | 10000000 |    25    |    75    |  
13771   |
+----------+----------+----------+----------+----------+----------+----------+
|    4     |    3     |    N     | 10000000 |    50    |   150    |  
16853   |
+----------+----------+----------+----------+----------+----------+----------+
|    4     |    3     |    N     | 10000000 |    75    |   225    |  
18511   |
+----------+----------+----------+----------+----------+----------+----------+
|    4     |    3     |    N     | 10000000 |   150    |   450    |  
20013   |
+----------+----------+----------+----------+----------+----------+----------+
|    4     |    3     |    N     | 7574241  |   200    |   600    |  
22935   |
+----------+----------+----------+----------+----------+----------+----------+
|    4     |    3     |    Y     | 10000000 |   200    |   600    |  
19737   |
+----------+----------+----------+----------+----------+----------+----------+
|    4     |    3     |    N     | 9843677  |   250    |   750    |  
20869   |
+----------+----------+----------+----------+----------+----------+----------+
|    4     |    3     |    Y     | 10000000 |   250    |   750    |  
21217   |
+----------+----------+----------+----------+----------+----------+----------+
|    4     |    3     |    N     | 5015711  |   300    |   900    |  
24177   |
+----------+----------+----------+----------+----------+----------+----------+
|    4     |    3     |    Y     | 10000000 |   300    |   900    |  
206134  |
+----------+----------+----------+----------+----------+----------+----------+

Other Observations
* `vmstat` showed no swapping during runs
* `iostat -x` always showed 0's for  avgqu-sz, await, and %util on the 
/raid0 (data) partition; 0-150, 0-334ms, and 0-60% respectively for the 
/ (commitlog) partition
* %steal from iostat ranged from 8-26% every run (one node had an almost 
constant 26% while the others averaged closer to 10%)
* `nodetool tpstats` never showed more than 10's of Pending ops in 
RequestResponseStage; no more than 1-2K Pending ops in MutationStage.  
Usually a single node would register ops; the others would be 0's
* After all test runs, Memtable Switch Count was 1385 for 
Keyspace1.Standard1
* Load average on the Cassandra nodes was very high the entire time, 
especially for tests where each client ran > 100 threads.  Here's one 
sample @ 200 threads each (600 total):

[i-94e8d2fb] alex@cassandra-qa-1:~$ uptime
17:18:26 up 1 day, 19:04,  2 users,  load average: 20.18, 15.20, 12.87
[i-a0e5dfcf] alex@cassandra-qa-2:~$ uptime
17:18:26 up 1 day, 18:52,  2 users,  load average: 22.65, 25.60, 21.71
[i-92dde7fd] alex@cassandra-qa-3:~$ uptime
17:18:26 up 1 day, 18:44,  2 users,  load average: 24.19, 28.29, 20.17
[i-08caf067] alex@cassandra-qa-4:~$ uptime
17:18:26 up 1 day, 18:37,  2 users,  load average: 31.74, 20.99, 13.97

* Average resource utilization on the client nodes was between 10-80% 
CPU; 5-25% memory depending on # of threads.  Load average was always 
negligible (presumably because there was no I/O)
* After a few runs and truncate operations on Keyspace1.Standard1, the 
ring became unbalanced before runs:

[i-94e8d2fb] alex@cassandra-qa-1:~$ nodetool -h localhost ring
Address         Status State   Load            Owns    Token
                                                        
127605887595351923798765477786913079296
10.240.114.143  Up     Normal  2.1 GB          25.00%  0
10.210.154.63   Up     Normal  330.19 MB       25.00%  
42535295865117307932921825928971026432
10.110.63.247   Up     Normal  361.38 MB       25.00%  
85070591730234615865843651857942052864
10.46.143.223   Up     Normal  1.6 GB          25.00%  
127605887595351923798765477786913079296

and after runs:

[i-94e8d2fb] alex@cassandra-qa-1:~$ nodetool -h localhost ring
Address         Status State   Load            Owns    Token
                                                        
127605887595351923798765477786913079296
10.240.114.143  Up     Normal  3.9 GB          25.00%  0
10.210.154.63   Up     Normal  2.05 GB         25.00%  
42535295865117307932921825928971026432
10.110.63.247   Up     Normal  2.07 GB         25.00%  
85070591730234615865843651857942052864
10.46.143.223   Up     Normal  3.33 GB         25.00%  
127605887595351923798765477786913079296

Based on the above, would I be correct in assuming that frequent 
memtable flushes and/or commitlog I/O are the likely bottlenecks?  Could 
%steal be partially contributing to the low throughput numbers as well?  
If a single XL node can do ~12k writes/s, would it be reasonable to 
expect ~40k writes/s with the above work load and number of nodes?

Thanks for your help, Alex.

On 4/25/11 11:23 AM, Joaquin Casares wrote:
> Did the images have EBS storage or Instance Store storage?
>
> Typically EBS volumes aren't the best to be benchmarking against:
> http://www.mail-archive.com/user@cassandra.apache.org/msg11022.html
>
> Joaquin Casares
> DataStax
> Software Engineer/Support
>
>
>
> On Wed, Apr 20, 2011 at 5:12 PM, Jonathan Ellis <jbellis@gmail.com 
> <mailto:jbellis@gmail.com>> wrote:
>
>     A few months ago I was seeing 12k writes/s on a single EC2 XL. So
>     something is wrong.
>
>     My first suspicion is that your client node may be the bottleneck.
>
>     On Wed, Apr 20, 2011 at 2:56 PM, Alex Araujo
>     <cassandra-users@alex.otherinbox.com
>     <mailto:cassandra-users@alex.otherinbox.com>> wrote:
>     > Does anyone have any Ec2 benchmarks/experiences they can share? 
>     I am trying
>     > to get a sense for what to expect from a production cluster on
>     Ec2 so that I
>     > can compare my application's performance against a sane
>     baseline.  What I
>     > have done so far is:
>     >
>     > 1. Lunched a 4 node cluster of m1.xlarge instances in the same
>     availability
>     > zone using PyStratus
>     (https://github.com/digitalreasoning/PyStratus).  Each
>     > node has the following specs (according to Amazon):
>     > 15 GB memory
>     > 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
>     > 1,690 GB instance storage
>     > 64-bit platform
>     >
>     > 2. Changed the default PyStratus directories in order to have
>     commit logs on
>     > the root partition and data files on ephemeral storage:
>     > commitlog_directory: /var/cassandra-logs
>     > data_file_directories: [/mnt/cassandra-data]
>     >
>     > 2. Gave each node 10GB of MAX_HEAP; 1GB HEAP_NEWSIZE in
>     > conf/cassandra-env.sh
>     >
>     > 3. Ran `contrib/stress/bin/stress -d node1,..,node4 -n 10000000
>     -t 100` on a
>     > separate m1.large instance:
>     > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
>     > ...
>     > 9832712,7120,7120,0.004948514851485148,842
>     > 9907616,7490,7490,0.0043189949802413755,852
>     > 9978357,7074,7074,0.004560353967289125,863
>     > 10000000,2164,2164,0.004065933558194335,867
>     >
>     > 4. Truncated Keyspace1.Standard1:
>     > # /usr/local/apache-cassandra/bin/cassandra-cli -host localhost
>     -port 9160
>     > Connected to: "Test Cluster" on x.x.x.x/9160
>     > Welcome to cassandra CLI.
>     >
>     > Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
>     > [default@unknown] use Keyspace1;
>     > Authenticated to keyspace: Keyspace1
>     > [default@Keyspace1] truncate Standard1;
>     > null
>     >
>     > 5. Expanded the cluster to 8 nodes using PyStratus and sanity
>     checked using
>     > nodetool:
>     > # /usr/local/apache-cassandra/bin/nodetool -h localhost ring
>     > Address         Status State   Load            Owns
>     > Token
>     > x.x.x.x  Up     Normal  1.3 GB          12.50%
>     > 21267647932558653966460912964485513216
>     > x.x.x.x   Up     Normal  3.06 GB         12.50%
>     > 42535295865117307932921825928971026432
>     > x.x.x.x     Up     Normal  1.16 GB         12.50%
>     > 63802943797675961899382738893456539648
>     > x.x.x.x   Up     Normal  2.43 GB         12.50%
>     > 85070591730234615865843651857942052864
>     > x.x.x.x   Up     Normal  1.22 GB         12.50%
>     > 106338239662793269832304564822427566080
>     > x.x.x.x    Up     Normal  2.74 GB         12.50%
>     > 127605887595351923798765477786913079296
>     > x.x.x.x    Up     Normal  1.22 GB         12.50%
>     > 148873535527910577765226390751398592512
>     > x.x.x.x   Up     Normal  2.57 GB         12.50%
>     > 170141183460469231731687303715884105728
>     >
>     > 6. Ran `contrib/stress/bin/stress -d node1,..,node8 -n 10000000
>     -t 100` on a
>     > separate m1.large instance again:
>     > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
>     > ...
>     > 9880360,9649,9649,0.003210443956226165,720
>     > 9942718,6235,6235,0.003206934154398794,731
>     > 9997035,5431,5431,0.0032615939761032457,741
>     > 10000000,296,296,0.002660033726812816,742
>     >
>     > In a nutshell, 4 nodes inserted at 11,534 writes/sec and 8 nodes
>     inserted at
>     > 13,477 writes/sec.
>     >
>     > Those numbers seem a little low to me, but I don't have anything
>     to compare
>     > to.  I'd like to hear others' opinions before I spin my wheels
>     with with
>     > number of nodes, threads,  memtable, memory, and/or GC
>     settings.  Cheers,
>     > Alex.
>     >
>
>
>
>     --
>     Jonathan Ellis
>     Project Chair, Apache Cassandra
>     co-founder of DataStax, the source for professional Cassandra support
>     http://www.datastax.com
>

Mime
View raw message