From user-return-18421-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon Jul 4 19:02:11 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9AFDD6409 for ; Mon, 4 Jul 2011 19:02:11 +0000 (UTC) Received: (qmail 47992 invoked by uid 500); 4 Jul 2011 19:02:08 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 47904 invoked by uid 500); 4 Jul 2011 19:02:07 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 47896 invoked by uid 99); 4 Jul 2011 19:02:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jul 2011 19:02:07 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.161.43] (HELO mail-fx0-f43.google.com) (209.85.161.43) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jul 2011 19:02:01 +0000 Received: by fxg17 with SMTP id 17so4588622fxg.30 for ; Mon, 04 Jul 2011 12:01:40 -0700 (PDT) Received: by 10.223.79.139 with SMTP id p11mr9908827fak.118.1309806100508; Mon, 04 Jul 2011 12:01:40 -0700 (PDT) Received: from [192.168.0.6] (p5B1237E6.dip.t-dialin.net [91.18.55.230]) by mx.google.com with ESMTPS id e16sm349920fak.17.2011.07.04.12.01.39 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 04 Jul 2011 12:01:40 -0700 (PDT) From: ZFabrik Subscriber Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: multipart/alternative; boundary=Apple-Mail-18-786417030 Subject: Re: How to scale Cassandra? Date: Mon, 4 Jul 2011 21:01:38 +0200 In-Reply-To: To: user@cassandra.apache.org References: <9D80595B-107A-4EDD-9347-70DE5067E65C@zfabrik.de> <4e11c952.09a32a0a.5ff3.7e8f@mx.google.com> Message-Id: X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-18-786417030 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Let's assume you have 50 nodes and their work-load grows simultaneously. = You discover that the nodes are about to reach their limits (btw. what = is the actual limit of a Cassandra node? 100GB? 500GB? 1TB?)=20 You decide to add another 50 nodes. Do you do this within one step? Or = one after the other? Or in several rounds, always every RF-rd node? Or you add 20 nodes and move the token ranges. Again, in one step? 20 = steps? 4 steps 5 nodes each? This could take a while (in terms of days, if not weeks) in larger = clusters! Does anybody has experience with real life scale-outs? Regards Udo Am 04.07.2011 um 16:21 schrieb Paul Loy: > Well, by issuing a nodetool move when a node is under high load, you = basically make that node unresponsive. That's fine, but a nodetool move = on one node also means that that node's replica data needs to move = around the ring and possibly some replica data from the next (or = previous) node in the ring. So how does this affect other nodes wrt RF = and quorum? Will quorum fail until the replicas have moved also? >=20 > On Mon, Jul 4, 2011 at 3:08 PM, Dan Hendry = wrote: > Moving nodes does not result in downtime provide you use proper = replication factors and read/write consistencies. The typical = recommendation is RF=3D3 and QUORUM reads/writes. >=20 > =20 >=20 > Dan >=20 > =20 >=20 > From: Paul Loy [mailto:keteracel@gmail.com]=20 > Sent: July-04-11 5:59 > To: user@cassandra.apache.org > Subject: Re: How to scale Cassandra? >=20 > =20 >=20 > That's basically how I understand it. >=20 > However, I think it gets better with larger clusters as the proportion = of the ring you move around at any time is much lower. >=20 > On Mon, Jul 4, 2011 at 10:54 AM, Subscriber = wrote: >=20 > Hi there, >=20 > I read a lot of Cassandra's high scalability feature: allowing = seamless addition of nodes, no downtime etc. > But I wonder how one will do this in practice in an operational = system. >=20 > In the system we're going to implement we're expecting a huge number = of writes with uniformly distributed keys > (the keys are given and cannot be generated). That means using = RandomPartitioner will (more or less) result in > the same work-load per node as any other OrderPreservePartitioner - = right? >=20 > But how do you scale a (more or less) balanced Cassandra cluster? I = think that in the end > you always have to double the number of nodes (adding just a handful = of nodes disburdens only the split regions, the > work-load of untouched regions will grow with unchanged speed). >=20 > This seems to be ok for small clusters. But what do you do with when = you have several 100s of nodes in your cluster? > It seems to me that a balanced cluster is a bless for performance but = a curse for scalability... >=20 > What are the alternatives? One could re-distribute the token ranges, = but this would cause > downtimes (AFAIK); not an option! >=20 > Is there anything that I didn't understand or do I miss something = else? Is the only left strategy to make sure that > the cluster grows unbalanced so one can add nodes to the hotspots? = However in this case you have to make sure > that this strategy is lasting. Could be too optimistic... >=20 > Best Regards > Udo >=20 >=20 >=20 >=20 > --=20 > --------------------------------------------- > Paul Loy > paul@keteracel.com > http://uk.linkedin.com/in/paulloy >=20 > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 9.0.901 / Virus Database: 271.1.1/3743 - Release Date: = 07/04/11 02:35:00 >=20 >=20 >=20 >=20 > --=20 > --------------------------------------------- > Paul Loy > paul@keteracel.com > http://uk.linkedin.com/in/paulloy --Apple-Mail-18-786417030 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii Let's = assume you have 50 nodes and their work-load grows simultaneously. You = discover that the nodes are about to reach their limits (btw. what is = the actual limit of a Cassandra node? 100GB? 500GB? 1TB?) 
You = decide to add another 50 nodes. Do you do this within one step? Or one = after the other? Or in several rounds, always every RF-rd node?
Or = you add 20 nodes and move the token ranges. Again, in one step? 20 = steps? 4 steps 5 nodes each?
This could take a while (in terms = of days, if not weeks) in larger clusters!

Does = anybody has experience with real life = scale-outs?

Regards
Udo

Am 04.07.2011 um 16:21 schrieb Paul Loy:

Well, by = issuing a nodetool move when a node is under high load, you basically = make that node unresponsive. That's fine, but a nodetool move on one = node also means that that node's replica data needs to move around the = ring and possibly some replica data from the next (or previous) node in = the ring. So how does this affect other nodes wrt RF and quorum? Will = quorum fail until the replicas have moved also?

On Mon, Jul 4, 2011 at 3:08 PM, Dan = Hendry <dan.hendry.junk@gmail.com>= ; wrote:

Moving nodes does not result in = downtime provide you use proper replication factors and read/write = consistencies. The typical recommendation is RF=3D3 and QUORUM = reads/writes.

 

Dan

 

From: Paul Loy [mailto:keteracel@gmail.com]
Sent: July-04-11 5:59
To: user@cassandra.apache.org
Subject: Re: = How to scale Cassandra?

 

That's basically how = I understand it.

However, I think it gets better with larger = clusters as the proportion of the ring you move around at any time is = much lower.

On Mon, Jul 4, 2011 at 10:54 AM, Subscriber = <subscriber@zfabrik.de> = wrote:

Hi there,

I read a lot of Cassandra's high scalability feature: allowing seamless = addition of nodes, no downtime etc.
But I wonder how one will do this = in practice in an operational system.

In the system we're going = to implement we're expecting a huge number of writes with uniformly = distributed keys
(the keys are given and cannot be generated). That means using = RandomPartitioner will (more or less) result in
the same work-load = per node as any other OrderPreservePartitioner - right?

But how = do you scale a (more or less) balanced Cassandra cluster? I think that = in the end
you always have to double the number of nodes (adding just a handful of = nodes disburdens only the split regions, the
work-load of untouched = regions will grow with unchanged speed).

This seems to be ok for = small clusters. But what do you do with when you have several 100s of = nodes in your cluster?
It seems to me that a balanced cluster is a bless for performance but a = curse for scalability...

What are the alternatives? One could = re-distribute the token ranges, but this would cause
downtimes = (AFAIK); not an option!

Is there anything that I didn't understand or do I miss something = else? Is the only left strategy to make sure that
the cluster grows = unbalanced so one can add nodes to the hotspots? However in this case = you have to make sure
that this strategy is lasting. Could be too optimistic...

Best = Regards
Udo




-- =
---------------------------------------------
Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

No virus found in this = incoming message.
Checked by AVG - www.avg.com
Version: 9.0.901 / Virus Database: = 271.1.1/3743 - Release Date: 07/04/11 02:35:00




-- =
---------------------------------------------
Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

= --Apple-Mail-18-786417030--