incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oren Benjamin <o...@clearspring.com>
Subject Re: Cassandra benchmarking on Rackspace Cloud
Date Mon, 19 Jul 2010 01:45:38 GMT
Thanks for the info.  Very helpful in validating what I've been seeing.  As for the scaling
limit...

>> The above was single node testing.  I'd expect to be able to add nodes and scale
throughput.  Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless of
the number of nodes in the cluster.
> 
> This is what I would expect if a single machine is handling all the
> Thrift requests.  Are you spreading the client connections to all the
> machines?

Yes - in all tests I add all nodes in the cluster to the --nodes list.  The client requests
are in fact being dispersed among all the nodes as evidenced by the intermittent TimedOutExceptions
in the log which show up against the various nodes in the input list.  Could it be a result
of all the virtual nodes being hosted on the same physical hardware?  Am I running into some
connection limit?  I don't see anything pegged in the JMX stats.



On Jul 17, 2010, at 9:07 AM, Jonathan Ellis wrote:

> On Fri, Jul 16, 2010 at 6:06 PM, Oren Benjamin <oren@clearspring.com> wrote:
>> The first goal was to reproduce the test described on spyced here: http://spyced.blogspot.com/2010/01/cassandra-05.html
>> 
>> Using Cassandra 0.6.3, a 4GB/160GB cloud server (http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing)
with default storage-conf.xml and cassandra.in.sh, here's what I got:
>> 
>> Reads: 4,800/s
>> Writes: 9,000/s
>> 
>> Pretty close to the result posted on the blog, with a slightly lower write performance
(perhaps due to the availability of only a single disk for both commitlog and data).
> 
> You're getting as close as you are because you're comparing 0.6
> numbers with 0.5.  For 0.6 on the test machine used in the blog post
> (quad core, 2 disks, 4GB) we were getting 7k reads and 14k writes.
> 
> In our tests we saw a 5-15% performance penalty from adding a
> virtualization layer.  Things like only having a single disk are going
> to stack on top of that.
> 
>> The above was single node testing.  I'd expect to be able to add nodes and scale
throughput.  Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless of
the number of nodes in the cluster.
> 
> This is what I would expect if a single machine is handling all the
> Thrift requests.  Are you spreading the client connections to all the
> machines?
> 
>> The disk performance of the cloud servers have been extremely spotty... Is this normal
for the cloud?
> 
> Yes.
> 
>>  And if so, what's the solution re Cassandra?
> 
> The larger the instance you're using, the closer you are to having the
> entire machine, meaning less other users are competing with you for
> disk i/o.
> 
> Of course when you're renting the entire machine's worth, it can be
> more cost-effective to just use dedicated hardware.
> 
>>  However, Cassandra routes to the nearest node topologically and not to the best
performing one, so "bad" nodes will always result in high latency reads.
> 
> Cassandra routes reads around nodes with temporarily poor performance
> in 0.7, btw.
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com


Mime
View raw message