Pardon the long delay - went on holiday and got sidetracked before I
could return to this project.
@Joaquin - The DataStax AMI uses a RAID0 configuration on an instance
store's ephemeral drives.
@Jonathan - you were correct about the client node being the
bottleneck. I setup 3 XL client instances to run contrib/stress back on
the 4 node XL Cassandra cluster and incrementally raised number of
threads on the clients until I started seeing timeouts.
I set the following mem settings for the client JVMs: -Xms2G -Xmx10G
I raised the default MAX_HEAP setting from the AMI to 12GB (~80% of
available memory). I used the default AMI cassandra.yaml settings for
the Cassandra nodes until timeouts started appearing, and then raised
concurrent_writes to 300 based on a (perhaps arbitrary?) recommendation
in 'Cassandra: The Definitive Guide' that recommended raising that
number based on number of client threads (timeouts started appearing at
200 threads per client; 600 total threads). The client nodes were in
the same AZ as the Cassandra nodes, and I set the --keep-going option on
the clients for every other run >= 200 threads.
Results
+----------+----------+----------+----------+----------+----------+----------+
| Server | Client | --keep- | Columns | Client | Total |
Combined |
| Nodes | Nodes | going | | Threads | Threads |
Rate |
+==========+==========+==========+==========+==========+==========+==========+
| 4 | 3 | N | 10000000 | 25 | 75 |
13771 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | N | 10000000 | 50 | 150 |
16853 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | N | 10000000 | 75 | 225 |
18511 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | N | 10000000 | 150 | 450 |
20013 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | N | 7574241 | 200 | 600 |
22935 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | Y | 10000000 | 200 | 600 |
19737 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | N | 9843677 | 250 | 750 |
20869 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | Y | 10000000 | 250 | 750 |
21217 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | N | 5015711 | 300 | 900 |
24177 |
+----------+----------+----------+----------+----------+----------+----------+
| 4 | 3 | Y | 10000000 | 300 | 900 |
206134 |
+----------+----------+----------+----------+----------+----------+----------+
Other Observations
* `vmstat` showed no swapping during runs
* `iostat -x` always showed 0's for avgqu-sz, await, and %util on the
/raid0 (data) partition; 0-150, 0-334ms, and 0-60% respectively for the
/ (commitlog) partition
* %steal from iostat ranged from 8-26% every run (one node had an almost
constant 26% while the others averaged closer to 10%)
* `nodetool tpstats` never showed more than 10's of Pending ops in
RequestResponseStage; no more than 1-2K Pending ops in MutationStage.
Usually a single node would register ops; the others would be 0's
* After all test runs, Memtable Switch Count was 1385 for
Keyspace1.Standard1
* Load average on the Cassandra nodes was very high the entire time,
especially for tests where each client ran > 100 threads. Here's one
sample @ 200 threads each (600 total):
[i-94e8d2fb] alex@cassandra-qa-1:~$ uptime
17:18:26 up 1 day, 19:04, 2 users, load average: 20.18, 15.20, 12.87
[i-a0e5dfcf] alex@cassandra-qa-2:~$ uptime
17:18:26 up 1 day, 18:52, 2 users, load average: 22.65, 25.60, 21.71
[i-92dde7fd] alex@cassandra-qa-3:~$ uptime
17:18:26 up 1 day, 18:44, 2 users, load average: 24.19, 28.29, 20.17
[i-08caf067] alex@cassandra-qa-4:~$ uptime
17:18:26 up 1 day, 18:37, 2 users, load average: 31.74, 20.99, 13.97
* Average resource utilization on the client nodes was between 10-80%
CPU; 5-25% memory depending on # of threads. Load average was always
negligible (presumably because there was no I/O)
* After a few runs and truncate operations on Keyspace1.Standard1, the
ring became unbalanced before runs:
[i-94e8d2fb] alex@cassandra-qa-1:~$ nodetool -h localhost ring
Address Status State Load Owns Token
127605887595351923798765477786913079296
10.240.114.143 Up Normal 2.1 GB 25.00% 0
10.210.154.63 Up Normal 330.19 MB 25.00%
42535295865117307932921825928971026432
10.110.63.247 Up Normal 361.38 MB 25.00%
85070591730234615865843651857942052864
10.46.143.223 Up Normal 1.6 GB 25.00%
127605887595351923798765477786913079296
and after runs:
[i-94e8d2fb] alex@cassandra-qa-1:~$ nodetool -h localhost ring
Address Status State Load Owns Token
127605887595351923798765477786913079296
10.240.114.143 Up Normal 3.9 GB 25.00% 0
10.210.154.63 Up Normal 2.05 GB 25.00%
42535295865117307932921825928971026432
10.110.63.247 Up Normal 2.07 GB 25.00%
85070591730234615865843651857942052864
10.46.143.223 Up Normal 3.33 GB 25.00%
127605887595351923798765477786913079296
Based on the above, would I be correct in assuming that frequent
memtable flushes and/or commitlog I/O are the likely bottlenecks? Could
%steal be partially contributing to the low throughput numbers as well?
If a single XL node can do ~12k writes/s, would it be reasonable to
expect ~40k writes/s with the above work load and number of nodes?
Thanks for your help, Alex.
On 4/25/11 11:23 AM, Joaquin Casares wrote:
> Did the images have EBS storage or Instance Store storage?
>
> Typically EBS volumes aren't the best to be benchmarking against:
> http://www.mail-archive.com/user@cassandra.apache.org/msg11022.html
>
> Joaquin Casares
> DataStax
> Software Engineer/Support
>
>
>
> On Wed, Apr 20, 2011 at 5:12 PM, Jonathan Ellis <jbellis@gmail.com
> <mailto:jbellis@gmail.com>> wrote:
>
> A few months ago I was seeing 12k writes/s on a single EC2 XL. So
> something is wrong.
>
> My first suspicion is that your client node may be the bottleneck.
>
> On Wed, Apr 20, 2011 at 2:56 PM, Alex Araujo
> <cassandra-users@alex.otherinbox.com
> <mailto:cassandra-users@alex.otherinbox.com>> wrote:
> > Does anyone have any Ec2 benchmarks/experiences they can share?
> I am trying
> > to get a sense for what to expect from a production cluster on
> Ec2 so that I
> > can compare my application's performance against a sane
> baseline. What I
> > have done so far is:
> >
> > 1. Lunched a 4 node cluster of m1.xlarge instances in the same
> availability
> > zone using PyStratus
> (https://github.com/digitalreasoning/PyStratus). Each
> > node has the following specs (according to Amazon):
> > 15 GB memory
> > 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
> > 1,690 GB instance storage
> > 64-bit platform
> >
> > 2. Changed the default PyStratus directories in order to have
> commit logs on
> > the root partition and data files on ephemeral storage:
> > commitlog_directory: /var/cassandra-logs
> > data_file_directories: [/mnt/cassandra-data]
> >
> > 2. Gave each node 10GB of MAX_HEAP; 1GB HEAP_NEWSIZE in
> > conf/cassandra-env.sh
> >
> > 3. Ran `contrib/stress/bin/stress -d node1,..,node4 -n 10000000
> -t 100` on a
> > separate m1.large instance:
> > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
> > ...
> > 9832712,7120,7120,0.004948514851485148,842
> > 9907616,7490,7490,0.0043189949802413755,852
> > 9978357,7074,7074,0.004560353967289125,863
> > 10000000,2164,2164,0.004065933558194335,867
> >
> > 4. Truncated Keyspace1.Standard1:
> > # /usr/local/apache-cassandra/bin/cassandra-cli -host localhost
> -port 9160
> > Connected to: "Test Cluster" on x.x.x.x/9160
> > Welcome to cassandra CLI.
> >
> > Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
> > [default@unknown] use Keyspace1;
> > Authenticated to keyspace: Keyspace1
> > [default@Keyspace1] truncate Standard1;
> > null
> >
> > 5. Expanded the cluster to 8 nodes using PyStratus and sanity
> checked using
> > nodetool:
> > # /usr/local/apache-cassandra/bin/nodetool -h localhost ring
> > Address Status State Load Owns
> > Token
> > x.x.x.x Up Normal 1.3 GB 12.50%
> > 21267647932558653966460912964485513216
> > x.x.x.x Up Normal 3.06 GB 12.50%
> > 42535295865117307932921825928971026432
> > x.x.x.x Up Normal 1.16 GB 12.50%
> > 63802943797675961899382738893456539648
> > x.x.x.x Up Normal 2.43 GB 12.50%
> > 85070591730234615865843651857942052864
> > x.x.x.x Up Normal 1.22 GB 12.50%
> > 106338239662793269832304564822427566080
> > x.x.x.x Up Normal 2.74 GB 12.50%
> > 127605887595351923798765477786913079296
> > x.x.x.x Up Normal 1.22 GB 12.50%
> > 148873535527910577765226390751398592512
> > x.x.x.x Up Normal 2.57 GB 12.50%
> > 170141183460469231731687303715884105728
> >
> > 6. Ran `contrib/stress/bin/stress -d node1,..,node8 -n 10000000
> -t 100` on a
> > separate m1.large instance again:
> > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
> > ...
> > 9880360,9649,9649,0.003210443956226165,720
> > 9942718,6235,6235,0.003206934154398794,731
> > 9997035,5431,5431,0.0032615939761032457,741
> > 10000000,296,296,0.002660033726812816,742
> >
> > In a nutshell, 4 nodes inserted at 11,534 writes/sec and 8 nodes
> inserted at
> > 13,477 writes/sec.
> >
> > Those numbers seem a little low to me, but I don't have anything
> to compare
> > to. I'd like to hear others' opinions before I spin my wheels
> with with
> > number of nodes, threads, memtable, memory, and/or GC
> settings. Cheers,
> > Alex.
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>
|