cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julie <julie.su...@nextcentury.com>
Subject Re: Quick help on Cassandra please: cluster access and performance
Date Fri, 11 Jun 2010 14:50:38 GMT

li wei <liwei_6v <at> yahoo.com> writes:

> 
> Thanks you very much,  Per!
> 
> ----- Original Message ----
> From: Per Olesen <pol <at> trifork.com>
> To: "user <at> cassandra.apache.org" <user <at> cassandra.apache.org>
> Sent: Wed, June 9, 2010 4:02:52 PM
> Subject: Re: Quick help on Cassandra please: cluster access and performance
> 
> On Jun 9, 2010, at 9:47 PM, li wei wrote:
> 
> > Thanks a lot.
> > We are set READ one, WRITE ANY. Is this better than QUORUM in performance.
> 
> Yes, but less consistency safe.
> 
> > Do you think the cassandra  Cluster (with 2 or  nodes) should be always
faster than Single one node in the
> reality and theory?
> > Or it depends?
> 
> It depends 
> 
> I think the idea with cassandra is that it scales linearly. So, if you have
obtained some performance
> numbers X for read performance. And you get lots of new users and data
amounts, you can keep having X simply
> by adding new nodes.
> 
> But I think there are others on this list with much more insight into this
than mine!
> 
> /Per
> 
> 

We have done a lot of work trying to get performance to scale as we enlarge our
cluster and found that there is a single server bottleneck if all of your
clients talk to one server, no matter how many server nodes you add to your
cluster.  The best scaling that we experienced (quite linear, actually) was to
have our clients use a round-robin scheme to distribute their communications
evenly with all the server nodes in the cluster.  This avoids a single server
bottleneck. 

This is interesting since for most writes or reads, the server being contacted
will most likely have to ship off the row to be written/read to another server.

In our testing, we actually have x clients and x servers (where we've gone from
x=1, 2, 4, 8, and 16) where each client is talking to a particular server.  We
saw excellent performance scaling this way.  (For example, client1 contacts
server1, client2 contacts server2, etc.) A round robin approach is probably the
real way to do this for an actual system.  We tried MANY things but did not see
good scaling until we started evenly distributing our communications amongst all
the servers in the cluster.




Mime
View raw message