Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of gdusbabek@gmail.com designates
 209.85.215.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:reply-to:in-reply-to:references:date:message-id
         :subject:from:to:content-type:content-transfer-encoding;
        b=gn1BSddR+Ecf/9xlA+IlSx8MZsCwfA+A3JvrX43SunuDIaxcbVwaSQKqhiJDLDikyk
         wMlVl38GHJ0a9zhmldQmKoUmlFQmdwA/ssLdpc+uBYkM5HFu3FmlGYQST0fp0itCSqdD
         cViSRd9LEXFGbWGUifSYDEQzp2zzrk3ex0lYU=
MIME-Version: 1.0
Reply-To: gdusbabek@gmail.com
In-Reply-To: <AANLkTino3q+g6V8_CttW9eZK7+sNh-ut-ba8xmjdAdCB@mail.gmail.com>
References: <986CB10F20A14445A4C0C95D4E491B320EAD022E@ExchMBX104.netflix.com>
	<1288115100.81515077@192.168.2.228>
	<AANLkTino3q+g6V8_CttW9eZK7+sNh-ut-ba8xmjdAdCB@mail.gmail.com>
Date: Tue, 26 Oct 2010 15:08:31 -0500
Message-ID: <AANLkTik4zvrpWRuYVbYAbH=AUROYqxbUvTCVS-sD2k3o@mail.gmail.com>
Subject: Re: Best practice for adding new nodes to ring
From: Gary Dusbabek <gdusbabek@gmail.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Tue, Oct 26, 2010 at 14:56, Edward Capriolo <edlinuxguru@gmail.com> wrot=
e:
> On Tue, Oct 26, 2010 at 1:45 PM, Stu Hood <stu.hood@rackspace.com> wrote:
>> While the "adding virtual tokens/nodes to Cassandra" discussion is a goo=
d one, there are a few factors that might delay (or remove?) the necessity =
of adding that complexity:
>>
>> * In Cassandra 0.7, removing load from a node is fairly cheap: a bounded=
 number of reads are used to determine which portions of the large sorted d=
ata files (sstables) to stream, followed by "sendfile" calls to deliver the=
 data to the destination
>> * For a replication factor RF, RF nodes can send data to a new node: thi=
s means that to have all existing N nodes in your cluster participate in ad=
ding K nodes, you only need to add N / RF =3D K nodes per expansion: this i=
s a much easier factor to achieve than a power of 2.
>>
>> While the added nodes will not be immediately balanced, there are some p=
ossible improvements to our existing load-balancing facilities to better ha=
ndle unbalanced cases: see https://issues.apache.org/jira/browse/CASSANDRA-=
1418
>>
>> Finally, virtual nodes are not a panacea: reviewing the papers on https:=
//issues.apache.org/jira/browse/CASSANDRA-192 suggests that they are signif=
icantly more difficult to implement than our current solution.
>>
>> We haven't ruled virtual nodes out, but I think many of us are leaning t=
oward exploring improvements to our current architecture.
>>
>> Thanks,
>> Stu
>>
>> -----Original Message-----
>> From: "Greg Kim" <gkim@netflix.com>
>> Sent: Tuesday, October 26, 2010 12:21pm
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Best practice for adding new nodes to ring
>>
>> Hi,
>>
>> I have a question regarding the best practices for adding new nodes to a=
n existing cluster. =A0From reading the following wiki: http://wiki.apache.=
org/cassandra/Operations =A0-- I understand that when creating a brand new =
cluster -- we can use the following to calculate the initial token for each=
 node to achieve balance in the ring:
>> =A0def tokens(nodes):
>> =A0 =A0 for i in range(1, nodes + 1):
>> =A0 =A0 =A0 =A0 print (i * (2 ** 127 - 1) / nodes)
>>
>>
>> My question is on the best practice for adding new nodes to an existing =
cluster. =A0There is a recommendation in the wiki which is to basically to =
compute new tokens for every node and assign them manually using the nodeto=
ol command. =A0We're planning on running either 16GB or 32GB heaps on each =
of our nodes, so token re-assignment for each node in the cluster sounds li=
ke a very expensive operation especially in situations where we're adding n=
ew nodes to handle scaling issues w/ the existing cluster.
>>
>> I'm bit of a noob to cassandra, so wanted to see how others are currentl=
y coping w/ this. =A0One option can be to grow the cluster in the power of =
2 and use bootstraping w/ automatic token generation. =A0Is this an option =
that people are using? (but this gets exponentially expensive when you alre=
ady have a large # of nodes)
>>
>> Does anyone know why cassandra doesn't use virtual tokens (e.g. one node=
 token - creating 256 virtual node tokens in the ring)? =A0This way adding =
new nodes to an existing cluster will significantly mitigate the unbalance =
issue in the ring.
>>
>>
>> Thanks
>> gkim
>>
>>
>
> One could implement "Virtual nodes" by running multiple instances of
> cassandra on a single machine, each binding to a different IP,
> possibly each using a different physical disk.
>
> I can imagine this would cause some overhead and waste. However since
> current JVM's do not manage large heap sizes well this would be the
> way I would imagine running cassandra on a "Big iron/mainframe"
> machine with 128GB RAM 4 processors and 48 disks

You'd just want to make sure you have the IO capacity to handle it.
Personally, I think 8- or possibly 4- way systems would be up to the
task CPU-wise, but you'd have to think long and hard about how you
would manage disk IO.

Gary.