Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 65099 invoked from network); 11 Jun 2010 15:22:47 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Jun 2010 15:22:47 -0000 Received: (qmail 6876 invoked by uid 500); 11 Jun 2010 15:22:46 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 6800 invoked by uid 500); 11 Jun 2010 15:22:46 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 6791 invoked by uid 99); 11 Jun 2010 15:22:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jun 2010 15:22:46 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gcdcu-cassandra-user-1@m.gmane.org designates 80.91.229.12 as permitted sender) Received: from [80.91.229.12] (HELO lo.gmane.org) (80.91.229.12) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jun 2010 15:22:38 +0000 Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1ON63k-0003N8-3O for user@cassandra.apache.org; Fri, 11 Jun 2010 17:22:16 +0200 Received: from sbs.nextcentury.com ([71.179.165.114]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 11 Jun 2010 17:22:16 +0200 Received: from julie.sugar by sbs.nextcentury.com with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 11 Jun 2010 17:22:16 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: user@cassandra.apache.org connect(): No such file or directory From: Julie Subject: Re: Quick help on Cassandra please: cluster access and performance Date: Fri, 11 Jun 2010 15:22:01 +0000 (UTC) Lines: 61 Message-ID: References: <1OJmvq-0003eg-Sh@mail.eleven.de> <1OJnIA-0004R3-Ik@mail.eleven.de> <201006091026281454917@sina.com> <818452.96457.qm@web35301.mail.mud.yahoo.com> <765544.91727.qm@web35307.mail.mud.yahoo.com> <98E90026-35BE-4169-8FEE-9D9E874F74DA@trifork.com> <353017.6593.qm@web35307.mail.mud.yahoo.com> <127789.63119.qm@web35303.mail.mud.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: sea.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 71.179.165.114 (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100315 Firefox/3.5.9) X-Virus-Checked: Checked by ClamAV on apache.org li wei yahoo.com> writes: > > Thanks you very much, Per! > > ----- Original Message ---- > From: Per Olesen trifork.com> > To: "user cassandra.apache.org" cassandra.apache.org> > Sent: Wed, June 9, 2010 4:02:52 PM > Subject: Re: Quick help on Cassandra please: cluster access and performance > > On Jun 9, 2010, at 9:47 PM, li wei wrote: > > > Thanks a lot. > > We are set READ one, WRITE ANY. Is this better than QUORUM in performance. > > Yes, but less consistency safe. > > > Do you think the cassandra Cluster (with 2 or nodes) should be always faster than Single one node in the > reality and theory? > > Or it depends? > > It depends > > I think the idea with cassandra is that it scales linearly. So, if you have obtained some performance > numbers X for read performance. And you get lots of new users and data amounts, you can keep having X simply > by adding new nodes. > > But I think there are others on this list with much more insight into this than mine! > > /Per > > We have done a lot of work trying to get performance to scale as we enlarge our cluster and found that there is a single server bottleneck if all of your clients talk to one server, no matter how many server nodes you add to your cluster. The best scaling that we experienced (quite linear, actually) was to have our clients use a round-robin scheme to distribute their communications evenly with all the server nodes in the cluster. This avoids a single server bottleneck. This is interesting since for most writes or reads, the server being contacted will most likely have to ship off the row to be written/read to another server. In our testing, we actually have x clients and x servers (where we've gone from x=1, 2, 4, 8, and 16) where each client is talking to a particular server. We saw excellent performance scaling this way. (For example, client1 contacts server1, client2 contacts server2, etc.) A round robin approach is probably the real way to do this for an actual system. We tried MANY things but did not see good scaling until we started evenly distributing our communications amongst all the servers in the cluster.