From user-return-16027-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Wed Apr 20 19:57:17 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8A51312E2 for ; Wed, 20 Apr 2011 19:57:17 +0000 (UTC) Received: (qmail 20914 invoked by uid 500); 20 Apr 2011 19:57:15 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 20888 invoked by uid 500); 20 Apr 2011 19:57:15 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 20880 invoked by uid 99); 20 Apr 2011 19:57:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Apr 2011 19:57:15 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.83.66] (HELO mail-gw0-f66.google.com) (74.125.83.66) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Apr 2011 19:57:07 +0000 Received: by gwb17 with SMTP id 17so105532gwb.1 for ; Wed, 20 Apr 2011 12:56:46 -0700 (PDT) Received: by 10.236.180.69 with SMTP id i45mr6803368yhm.362.1303329406308; Wed, 20 Apr 2011 12:56:46 -0700 (PDT) Received: from alex.local (cpe-70-116-23-13.austin.res.rr.com [70.116.23.13]) by mx.google.com with ESMTPS id b28sm1156233anb.48.2011.04.20.12.56.45 (version=SSLv3 cipher=OTHER); Wed, 20 Apr 2011 12:56:45 -0700 (PDT) Message-ID: <4DAF3A7C.3020903@alex.otherinbox.com> Date: Wed, 20 Apr 2011 14:56:44 -0500 From: Alex Araujo User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: Cassandra Users Subject: Ec2 Stress Results Content-Type: multipart/alternative; boundary="------------040505070003080107090202" X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------040505070003080107090202 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Does anyone have any Ec2 benchmarks/experiences they can share? I am trying to get a sense for what to expect from a production cluster on Ec2 so that I can compare my application's performance against a sane baseline. What I have done so far is: 1. Lunched a 4 node cluster of m1.xlarge instances in the same availability zone using PyStratus (https://github.com/digitalreasoning/PyStratus). Each node has the following specs (according to Amazon): 15 GB memory 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each) 1,690 GB instance storage 64-bit platform 2. Changed the default PyStratus directories in order to have commit logs on the root partition and data files on ephemeral storage: commitlog_directory: /var/cassandra-logs data_file_directories: [/mnt/cassandra-data] 2. Gave each node 10GB of MAX_HEAP; 1GB HEAP_NEWSIZE in conf/cassandra-env.sh 3. Ran `contrib/stress/bin/stress -d node1,..,node4 -n 10000000 -t 100` on a separate m1.large instance: total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time ... 9832712,7120,7120,0.004948514851485148,842 9907616,7490,7490,0.0043189949802413755,852 9978357,7074,7074,0.004560353967289125,863 10000000,2164,2164,0.004065933558194335,867 4. Truncated Keyspace1.Standard1: # /usr/local/apache-cassandra/bin/cassandra-cli -host localhost -port 9160 Connected to: "Test Cluster" on x.x.x.x/9160 Welcome to cassandra CLI. Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] use Keyspace1; Authenticated to keyspace: Keyspace1 [default@Keyspace1] truncate Standard1; null 5. Expanded the cluster to 8 nodes using PyStratus and sanity checked using nodetool: # /usr/local/apache-cassandra/bin/nodetool -h localhost ring Address Status State Load Owns Token x.x.x.x Up Normal 1.3 GB 12.50% 21267647932558653966460912964485513216 x.x.x.x Up Normal 3.06 GB 12.50% 42535295865117307932921825928971026432 x.x.x.x Up Normal 1.16 GB 12.50% 63802943797675961899382738893456539648 x.x.x.x Up Normal 2.43 GB 12.50% 85070591730234615865843651857942052864 x.x.x.x Up Normal 1.22 GB 12.50% 106338239662793269832304564822427566080 x.x.x.x Up Normal 2.74 GB 12.50% 127605887595351923798765477786913079296 x.x.x.x Up Normal 1.22 GB 12.50% 148873535527910577765226390751398592512 x.x.x.x Up Normal 2.57 GB 12.50% 170141183460469231731687303715884105728 6. Ran `contrib/stress/bin/stress -d node1,..,node8 -n 10000000 -t 100` on a separate m1.large instance again: total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time ... 9880360,9649,9649,0.003210443956226165,720 9942718,6235,6235,0.003206934154398794,731 9997035,5431,5431,0.0032615939761032457,741 10000000,296,296,0.002660033726812816,742 In a nutshell, 4 nodes inserted at 11,534 writes/sec and 8 nodes inserted at 13,477 writes/sec. Those numbers seem a little low to me, but I don't have anything to compare to. I'd like to hear others' opinions before I spin my wheels with with number of nodes, threads, memtable, memory, and/or GC settings. Cheers, Alex. --------------040505070003080107090202 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Does anyone have any Ec2 benchmarks/experiences they can share?  I am trying to get a sense for what to expect from a production cluster on Ec2 so that I can compare my application's performance against a sane baseline.  What I have done so far is:

1. Lunched a 4 node cluster of m1.xlarge instances in the same availability zone using PyStratus (https://github.com/digitalreasoning/PyStratus).  Each node has the following specs (according to Amazon):
15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform

2. Changed the default PyStratus directories in order to have commit logs on the root partition and data files on ephemeral storage:
commitlog_directory: /var/cassandra-logs
data_file_directories: [/mnt/cassandra-data]

2. Gave each node 10GB of MAX_HEAP; 1GB HEAP_NEWSIZE in conf/cassandra-env.sh

3. Ran `contrib/stress/bin/stress -d node1,..,node4 -n 10000000 -t 100` on a separate m1.large instance:
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
...
9832712,7120,7120,0.004948514851485148,842
9907616,7490,7490,0.0043189949802413755,852
9978357,7074,7074,0.004560353967289125,863
10000000,2164,2164,0.004065933558194335,867

4. Truncated Keyspace1.Standard1:
# /usr/local/apache-cassandra/bin/cassandra-cli -host localhost -port 9160
Connected to: "Test Cluster" on x.x.x.x/9160
Welcome to cassandra CLI.

Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
[default@unknown] use Keyspace1;
Authenticated to keyspace: Keyspace1
[default@Keyspace1] truncate Standard1;
null

5. Expanded the cluster to 8 nodes using PyStratus and sanity checked using nodetool:
# /usr/local/apache-cassandra/bin/nodetool -h localhost ring
Address         Status State   Load            Owns    Token                                      
x.x.x.x  Up     Normal  1.3 GB          12.50%  21267647932558653966460912964485513216     
x.x.x.x   Up     Normal  3.06 GB         12.50%  42535295865117307932921825928971026432     
x.x.x.x     Up     Normal  1.16 GB         12.50%  63802943797675961899382738893456539648     
x.x.x.x   Up     Normal  2.43 GB         12.50%  85070591730234615865843651857942052864     
x.x.x.x   Up     Normal  1.22 GB         12.50%  106338239662793269832304564822427566080    
x.x.x.x    Up     Normal  2.74 GB         12.50%  127605887595351923798765477786913079296    
x.x.x.x    Up     Normal  1.22 GB         12.50%  148873535527910577765226390751398592512    
x.x.x.x   Up     Normal  2.57 GB         12.50%  170141183460469231731687303715884105728

6. Ran `contrib/stress/bin/stress -d node1,..,node8 -n 10000000 -t 100` on a separate m1.large instance again:
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
...
9880360,9649,9649,0.003210443956226165,720
9942718,6235,6235,0.003206934154398794,731
9997035,5431,5431,0.0032615939761032457,741
10000000,296,296,0.002660033726812816,742

In a nutshell, 4 nodes inserted at 11,534 writes/sec and 8 nodes inserted at 13,477 writes/sec.

Those numbers seem a little low to me, but I don't have anything to compare to.  I'd like to hear others' opinions before I spin my wheels with with number of nodes, threads,  memtable, memory, and/or GC settings.  Cheers, Alex.
--------------040505070003080107090202--