Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 28928 invoked from network); 20 Jul 2010 13:42:35 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Jul 2010 13:42:35 -0000 Received: (qmail 51030 invoked by uid 500); 20 Jul 2010 13:42:34 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 50882 invoked by uid 500); 20 Jul 2010 13:42:31 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 50855 invoked by uid 99); 20 Jul 2010 13:42:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Jul 2010 13:42:30 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.215.44] (HELO mail-ew0-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Jul 2010 13:42:20 +0000 Received: by ewy22 with SMTP id 22so1955230ewy.31 for ; Tue, 20 Jul 2010 06:42:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.28.196 with SMTP id g46mr5392286wea.43.1279633319915; Tue, 20 Jul 2010 06:41:59 -0700 (PDT) Sender: scode@scode.org Received: by 10.216.234.18 with HTTP; Tue, 20 Jul 2010 06:41:59 -0700 (PDT) X-Originating-IP: [194.236.60.91] In-Reply-To: References: <4780D37D-47DE-4A28-8DE7-59B119555537@clearspring.com> <1279554146.314124203@192.168.2.227> <1279560166.856519322@192.168.2.227> <1376EF21-E210-4675-9518-62F4CC695F7D@gmail.com> <1279563944.174322970@192.168.2.227> <45DBBF46-1473-467C-8891-043B031095C7@gmail.com> <464CC098-1B5B-4E20-BFEE-DFBDA3FC6451@gmail.com> Date: Tue, 20 Jul 2010 15:41:59 +0200 X-Google-Sender-Auth: d7BUWtApsuh4EuDNNeu4s9FHGXQ Message-ID: Subject: Re: Cassandra benchmarking on Rackspace Cloud From: Peter Schuller To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org > But what's then the point with adding nodes into the ring? Disk speed! Well, it may also be cheaper to service an RPC request than service a full read or write, even in terms of CPU. But: Even taking into account that requests are distributed randomly, the cluster should still scale. You will approach the overhead of taking the overhead of a level of RPC indirection for 100% of requests, but it won't become worse than that. That overhead is still going to be distributed across the entire cluster and you should still be seeing throughput increasing as nodes are added. That said, given that the test in this case is probably the cheapest possible test to make, even in terms of CPU, by hitting non-existent values, maybe the RPC overhead is simply big enough relative to this type of request that moving from 1 to 4 nodes doesn't show an improvement. Suppose for example that the cost of forwarding an RPC request is comparabale to servicing a read request for a non-existent key. Under those conditions, going from 1 to 2 nodes would not be expected to affect throughput at all. Going from 2 to 3 should start to see an improvement, etc. If RPC overhead is higher than servicing the read, you'd see performance drop from 1 to 2 nodes (but should still eventually start scaling with node count). What seems inconsistent with this hypothesis is that in the numbers reported by David, there is an initial drop in performance going from 1 to 2 nodes, and then it seems to flatten completely rather than changing as more nodes are added. Other than at the point of equilibrium between additional RPC overhead and additional capacity, I'd expect to either see an increase or a decrease in performance with each added node. Additionally, in the absolute beginning of this thread, before the move to testing non-existent keys, they were hitting the performance 'roof' even with "real" read traffic. Presuming such "real" read traffic is more expensive to process than key misses on an empty cluster, that is even more inconsistent with the hypothesis. (I'm hoping to have time to run my test on EC2 tonight; will see.) -- / Peter Schuller