incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oren Benjamin <o...@clearspring.com>
Subject Re: Cassandra benchmarking on Rackspace Cloud
Date Mon, 19 Jul 2010 15:39:43 GMT
Yes, as the size of the data on disk increases and the OS cannot avoid disk seeks the read
performance degrades.  You can see this in the results from the original post where the number
of keys in the test goes from 10M to 100M the reads drop from 4,600/s to 200/s.  10M keys
in the stress.py test corresponds to roughly 5GB on disk.  If the frequently used records
are small enough to fit into cache, you can restore read performance by lifting them into
the row cache and largely avoiding seeks during reads.

On Jul 17, 2010, at 1:05 AM, Schubert Zhang wrote:

I think your read throughput is very high, and it may be unauthentic.

For random read, the disk seek will always be the bottleneck (100% utils)
There will be about 3 random disk-seeks for a random read, and aout 10ms for one seek. So,
there will be 30ms for a random read.

If you have only one disk, the read throughput will be about 40reads/s.

High tested throughput may because of the Linux FS cache, if your dataset is small (for example
only 1GB).
Try to test the random read throughput on 100GB or 1TB, you may get different result.


On Sat, Jul 17, 2010 at 7:06 AM, Oren Benjamin <oren@clearspring.com<mailto:oren@clearspring.com>>
wrote:
I've been doing quite a bit of benchmarking of Cassandra in the cloud using stress.py  I'm
working on a comprehensive spreadsheet of results with a template that others can add to,
but for now I thought I'd post some of the basic results here to get some feedback from others.

The first goal was to reproduce the test described on spyced here: http://spyced.blogspot.com/2010/01/cassandra-05.html

Using Cassandra 0.6.3, a 4GB/160GB cloud server (http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing)
with default storage-conf.xml and cassandra.in.sh<http://cassandra.in.sh/>, here's what
I got:

Reads: 4,800/s
Writes: 9,000/s

Pretty close to the result posted on the blog, with a slightly lower write performance (perhaps
due to the availability of only a single disk for both commitlog and data).

That was with 1M keys (the blog used 700K).

As number of keys scale read performance degrades as would be expected with no caching:
1M 4,800 reads/s
10M 4,600 reads/s
25M 700 reads/s
100M 200 reads/s

Using row cache and an appropriate choice of --stdev to achieve a cache hit rate of >90%
restores the read performance to the 4,800 reads/s level in all cases.  Also as expected,
write performance is unaffected by writing more data.

Scaling:
The above was single node testing.  I'd expect to be able to add nodes and scale throughput.
 Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless of the number
of nodes in the cluster.  In order to better understand this, I eliminated the factors of
data size, caching, replication etc. and ran read tests on empty clusters (every read a miss
- bouncing off the bloom filter and straight back).  1 node gives 24,000 reads/s while 2,3,4...
give 21,000 (presumably the bump in single node performance is due to the lack of the extra
hop).  With CPU, disk, and RAM all largely unused, I'm at a loss to explain the lack of additional
throughput.  I tried increasing the number of clients but that just split the throughput down
the middle with each stress.py achieving roughly 10,000 reads/s.  I'm running the clients
(stress.py) on separate cloud servers.

I checked the ulimit file count and I'm not limiting connections there.  It seems like there's
a problem with my test setup - a clear bottleneck somewhere, but I just don't see what it
is.  Any ideas?

Also:
The disk performance of the cloud servers have been extremely spotty.  The results I posted
above were reproducible whenever the servers were in their "normal" state.  But for periods
of as much as several consecutive hours, single servers or groups of servers in the cloud
would suddenly have horrendous disk performance as measured by dstat and iostat.  The "% steal"
by hypervisor on these nodes is also quite high (> 30%).   The performance during these
"bad" periods drops from 4,800reads/s in the single node benchmark to just 200reads/s.  The
node is effectively useless.  Is this normal for the cloud?  And if so, what's the solution
re Cassandra?  Obviously you can just keep adding more nodes until the likelihood that there
is at least one good server with every piece of the data is reasonable.  However, Cassandra
routes to the nearest node topologically and not to the best performing one, so "bad" nodes
will always result in high latency reads.  How are you guys that are running in the cloud
dealing with this?  Are you seeing this at all?

Thanks in advance for your feedback and advice,

  -- Oren



Mime
View raw message