cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Cassandra benchmarking on Rackspace Cloud
Date Sat, 17 Jul 2010 13:07:04 GMT
On Fri, Jul 16, 2010 at 6:06 PM, Oren Benjamin <oren@clearspring.com> wrote:
> The first goal was to reproduce the test described on spyced here: http://spyced.blogspot.com/2010/01/cassandra-05.html
>
> Using Cassandra 0.6.3, a 4GB/160GB cloud server (http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing)
with default storage-conf.xml and cassandra.in.sh, here's what I got:
>
> Reads: 4,800/s
> Writes: 9,000/s
>
> Pretty close to the result posted on the blog, with a slightly lower write performance
(perhaps due to the availability of only a single disk for both commitlog and data).

You're getting as close as you are because you're comparing 0.6
numbers with 0.5.  For 0.6 on the test machine used in the blog post
(quad core, 2 disks, 4GB) we were getting 7k reads and 14k writes.

In our tests we saw a 5-15% performance penalty from adding a
virtualization layer.  Things like only having a single disk are going
to stack on top of that.

> The above was single node testing.  I'd expect to be able to add nodes and scale throughput.
 Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless of the number
of nodes in the cluster.

This is what I would expect if a single machine is handling all the
Thrift requests.  Are you spreading the client connections to all the
machines?

> The disk performance of the cloud servers have been extremely spotty... Is this normal
for the cloud?

Yes.

>  And if so, what's the solution re Cassandra?

The larger the instance you're using, the closer you are to having the
entire machine, meaning less other users are competing with you for
disk i/o.

Of course when you're renting the entire machine's worth, it can be
more cost-effective to just use dedicated hardware.

> However, Cassandra routes to the nearest node topologically and not to the best performing
one, so "bad" nodes will always result in high latency reads.

Cassandra routes reads around nodes with temporarily poor performance
in 0.7, btw.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message