Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of oren@clearspring.com designates
 64.71.238.38 as permitted sender)
From: Oren Benjamin <oren@clearspring.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Fri, 16 Jul 2010 19:06:50 -0400
Subject: Cassandra benchmarking on Rackspace Cloud
Thread-Topic: Cassandra benchmarking on Rackspace Cloud
Thread-Index: AcslO5MvwWufdtL0RW6+sdYsuBx0RA==
Message-ID: <4780D37D-47DE-4A28-8DE7-59B119555537@clearspring.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

I've been doing quite a bit of benchmarking of Cassandra in the cloud using=
 stress.py  I'm working on a comprehensive spreadsheet of results with a te=
mplate that others can add to, but for now I thought I'd post some of the b=
asic results here to get some feedback from others.

The first goal was to reproduce the test described on spyced here: http://s=
pyced.blogspot.com/2010/01/cassandra-05.html

Using Cassandra 0.6.3, a 4GB/160GB cloud server (http://www.rackspacecloud.=
com/cloud_hosting_products/servers/pricing) with default storage-conf.xml a=
nd cassandra.in.sh, here's what I got:

Reads: 4,800/s
Writes: 9,000/s

Pretty close to the result posted on the blog, with a slightly lower write =
performance (perhaps due to the availability of only a single disk for both=
 commitlog and data).

That was with 1M keys (the blog used 700K).

As number of keys scale read performance degrades as would be expected with=
 no caching:
1M 4,800 reads/s
10M 4,600 reads/s
25M 700 reads/s
100M 200 reads/s

Using row cache and an appropriate choice of --stdev to achieve a cache hit=
 rate of >90% restores the read performance to the 4,800 reads/s level in a=
ll cases.  Also as expected, write performance is unaffected by writing mor=
e data.

Scaling:
The above was single node testing.  I'd expect to be able to add nodes and =
scale throughput.  Unfortunately, I seem to be running into a cap of 21,000=
 reads/s regardless of the number of nodes in the cluster.  In order to bet=
ter understand this, I eliminated the factors of data size, caching, replic=
ation etc. and ran read tests on empty clusters (every read a miss - bounci=
ng off the bloom filter and straight back).  1 node gives 24,000 reads/s wh=
ile 2,3,4... give 21,000 (presumably the bump in single node performance is=
 due to the lack of the extra hop).  With CPU, disk, and RAM all largely un=
used, I'm at a loss to explain the lack of additional throughput.  I tried =
increasing the number of clients but that just split the throughput down th=
e middle with each stress.py achieving roughly 10,000 reads/s.  I'm running=
 the clients (stress.py) on separate cloud servers.

I checked the ulimit file count and I'm not limiting connections there.  It=
 seems like there's a problem with my test setup - a clear bottleneck somew=
here, but I just don't see what it is.  Any ideas?

Also:=20
The disk performance of the cloud servers have been extremely spotty.  The =
results I posted above were reproducible whenever the servers were in their=
 "normal" state.  But for periods of as much as several consecutive hours, =
single servers or groups of servers in the cloud would suddenly have horren=
dous disk performance as measured by dstat and iostat.  The "% steal" by hy=
pervisor on these nodes is also quite high (> 30%).   The performance durin=
g these "bad" periods drops from 4,800reads/s in the single node benchmark =
to just 200reads/s.  The node is effectively useless.  Is this normal for t=
he cloud?  And if so, what's the solution re Cassandra?  Obviously you can =
just keep adding more nodes until the likelihood that there is at least one=
 good server with every piece of the data is reasonable.  However, Cassandr=
a routes to the nearest node topologically and not to the best performing o=
ne, so "bad" nodes will always result in high latency reads.  How are you g=
uys that are running in the cloud dealing with this?  Are you seeing this a=
t all?

Thanks in advance for your feedback and advice,

   -- Oren=