Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 85369 invoked from network); 16 Jul 2010 23:07:46 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Jul 2010 23:07:46 -0000 Received: (qmail 26003 invoked by uid 500); 16 Jul 2010 23:07:44 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 25953 invoked by uid 500); 16 Jul 2010 23:07:44 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 25945 invoked by uid 99); 16 Jul 2010 23:07:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Jul 2010 23:07:44 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of oren@clearspring.com designates 64.71.238.38 as permitted sender) Received: from [64.71.238.38] (HELO mse20fe2.MSE20.exchange.ms) (64.71.238.38) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Jul 2010 23:07:35 +0000 Received: from mse20be2.MSE20.exchange.ms ([172.28.1.15]) by mse20fe2.MSE20.exchange.ms ([172.29.12.84]) with mapi; Fri, 16 Jul 2010 19:06:52 -0400 From: Oren Benjamin To: "user@cassandra.apache.org" Date: Fri, 16 Jul 2010 19:06:50 -0400 Subject: Cassandra benchmarking on Rackspace Cloud Thread-Topic: Cassandra benchmarking on Rackspace Cloud Thread-Index: AcslO5MvwWufdtL0RW6+sdYsuBx0RA== Message-ID: <4780D37D-47DE-4A28-8DE7-59B119555537@clearspring.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org I've been doing quite a bit of benchmarking of Cassandra in the cloud using= stress.py I'm working on a comprehensive spreadsheet of results with a te= mplate that others can add to, but for now I thought I'd post some of the b= asic results here to get some feedback from others. The first goal was to reproduce the test described on spyced here: http://s= pyced.blogspot.com/2010/01/cassandra-05.html Using Cassandra 0.6.3, a 4GB/160GB cloud server (http://www.rackspacecloud.= com/cloud_hosting_products/servers/pricing) with default storage-conf.xml a= nd cassandra.in.sh, here's what I got: Reads: 4,800/s Writes: 9,000/s Pretty close to the result posted on the blog, with a slightly lower write = performance (perhaps due to the availability of only a single disk for both= commitlog and data). That was with 1M keys (the blog used 700K). As number of keys scale read performance degrades as would be expected with= no caching: 1M 4,800 reads/s 10M 4,600 reads/s 25M 700 reads/s 100M 200 reads/s Using row cache and an appropriate choice of --stdev to achieve a cache hit= rate of >90% restores the read performance to the 4,800 reads/s level in a= ll cases. Also as expected, write performance is unaffected by writing mor= e data. Scaling: The above was single node testing. I'd expect to be able to add nodes and = scale throughput. Unfortunately, I seem to be running into a cap of 21,000= reads/s regardless of the number of nodes in the cluster. In order to bet= ter understand this, I eliminated the factors of data size, caching, replic= ation etc. and ran read tests on empty clusters (every read a miss - bounci= ng off the bloom filter and straight back). 1 node gives 24,000 reads/s wh= ile 2,3,4... give 21,000 (presumably the bump in single node performance is= due to the lack of the extra hop). With CPU, disk, and RAM all largely un= used, I'm at a loss to explain the lack of additional throughput. I tried = increasing the number of clients but that just split the throughput down th= e middle with each stress.py achieving roughly 10,000 reads/s. I'm running= the clients (stress.py) on separate cloud servers. I checked the ulimit file count and I'm not limiting connections there. It= seems like there's a problem with my test setup - a clear bottleneck somew= here, but I just don't see what it is. Any ideas? Also:=20 The disk performance of the cloud servers have been extremely spotty. The = results I posted above were reproducible whenever the servers were in their= "normal" state. But for periods of as much as several consecutive hours, = single servers or groups of servers in the cloud would suddenly have horren= dous disk performance as measured by dstat and iostat. The "% steal" by hy= pervisor on these nodes is also quite high (> 30%). The performance durin= g these "bad" periods drops from 4,800reads/s in the single node benchmark = to just 200reads/s. The node is effectively useless. Is this normal for t= he cloud? And if so, what's the solution re Cassandra? Obviously you can = just keep adding more nodes until the likelihood that there is at least one= good server with every piece of the data is reasonable. However, Cassandr= a routes to the nearest node topologically and not to the best performing o= ne, so "bad" nodes will always result in high latency reads. How are you g= uys that are running in the cloud dealing with this? Are you seeing this a= t all? Thanks in advance for your feedback and advice, -- Oren=