incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Dusbabek <gdusba...@gmail.com>
Subject Re: Best practice for adding new nodes to ring
Date Tue, 26 Oct 2010 20:08:31 GMT
On Tue, Oct 26, 2010 at 14:56, Edward Capriolo <edlinuxguru@gmail.com> wrote:
> On Tue, Oct 26, 2010 at 1:45 PM, Stu Hood <stu.hood@rackspace.com> wrote:
>> While the "adding virtual tokens/nodes to Cassandra" discussion is a good one, there
are a few factors that might delay (or remove?) the necessity of adding that complexity:
>>
>> * In Cassandra 0.7, removing load from a node is fairly cheap: a bounded number of
reads are used to determine which portions of the large sorted data files (sstables) to stream,
followed by "sendfile" calls to deliver the data to the destination
>> * For a replication factor RF, RF nodes can send data to a new node: this means that
to have all existing N nodes in your cluster participate in adding K nodes, you only need
to add N / RF = K nodes per expansion: this is a much easier factor to achieve than a power
of 2.
>>
>> While the added nodes will not be immediately balanced, there are some possible improvements
to our existing load-balancing facilities to better handle unbalanced cases: see https://issues.apache.org/jira/browse/CASSANDRA-1418
>>
>> Finally, virtual nodes are not a panacea: reviewing the papers on https://issues.apache.org/jira/browse/CASSANDRA-192
suggests that they are significantly more difficult to implement than our current solution.
>>
>> We haven't ruled virtual nodes out, but I think many of us are leaning toward exploring
improvements to our current architecture.
>>
>> Thanks,
>> Stu
>>
>> -----Original Message-----
>> From: "Greg Kim" <gkim@netflix.com>
>> Sent: Tuesday, October 26, 2010 12:21pm
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Best practice for adding new nodes to ring
>>
>> Hi,
>>
>> I have a question regarding the best practices for adding new nodes to an existing
cluster.  From reading the following wiki: http://wiki.apache.org/cassandra/Operations  --
I understand that when creating a brand new cluster -- we can use the following to calculate
the initial token for each node to achieve balance in the ring:
>>  def tokens(nodes):
>>     for i in range(1, nodes + 1):
>>         print (i * (2 ** 127 - 1) / nodes)
>>
>>
>> My question is on the best practice for adding new nodes to an existing cluster.
 There is a recommendation in the wiki which is to basically to compute new tokens for every
node and assign them manually using the nodetool command.  We're planning on running either
16GB or 32GB heaps on each of our nodes, so token re-assignment for each node in the cluster
sounds like a very expensive operation especially in situations where we're adding new nodes
to handle scaling issues w/ the existing cluster.
>>
>> I'm bit of a noob to cassandra, so wanted to see how others are currently coping
w/ this.  One option can be to grow the cluster in the power of 2 and use bootstraping w/
automatic token generation.  Is this an option that people are using? (but this gets exponentially
expensive when you already have a large # of nodes)
>>
>> Does anyone know why cassandra doesn't use virtual tokens (e.g. one node token -
creating 256 virtual node tokens in the ring)?  This way adding new nodes to an existing
cluster will significantly mitigate the unbalance issue in the ring.
>>
>>
>> Thanks
>> gkim
>>
>>
>
> One could implement "Virtual nodes" by running multiple instances of
> cassandra on a single machine, each binding to a different IP,
> possibly each using a different physical disk.
>
> I can imagine this would cause some overhead and waste. However since
> current JVM's do not manage large heap sizes well this would be the
> way I would imagine running cassandra on a "Big iron/mainframe"
> machine with 128GB RAM 4 processors and 48 disks

You'd just want to make sure you have the IO capacity to handle it.
Personally, I think 8- or possibly 4- way systems would be up to the
task CPU-wise, but you'd have to think long and hard about how you
would manage disk IO.

Gary.

Mime
View raw message