incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhu Han <schumi....@gmail.com>
Subject Re: RFC: Cassandra Virtual Nodes
Date Thu, 22 Mar 2012 05:48:28 GMT
On Tue, Mar 20, 2012 at 11:24 PM, Jeremiah Jordan <
JEREMIAH.JORDAN@morningstar.com> wrote:

> So taking a step back, if we want "vnodes" why can't we just give every
> node 100 tokens instead of only one?  Seems to me this would have less
> impact on the rest of the code.  It would just look like you had a 500 node
> cluster, instead of a 5 node cluster.  Your replication strategy would have
> to know about the physical machines so that data gets replicated right, but
> there is already some concept of this with the data center aware and rack
> aware stuff.
>
> From what I see I think you could get most of the benefits of vnodes by
> implementing a new Placement Strategy that did something like this, and you
> wouldn't have to touch (and maybe break) code in other places.
>
> Am I crazy? Naive?
>
> Once you had this setup, you can start to implement the vnode like stuff
> on top of it.  Like bootstrapping nodes in one token at a time, and taking
> them on from the whole cluster, not just your neighbor. etc. etc.
>

I second it.

Is there some goals we missed which can not be achieved by assigning
multiple tokens to a single node?


>
> -Jeremiah Jordan
>
> ________________________________________
> From: Rick Branson [rbranson@datastax.com]
> Sent: Monday, March 19, 2012 5:16 PM
> To: dev@cassandra.apache.org
> Subject: Re: RFC: Cassandra Virtual Nodes
>
> I think if we could go back and rebuild Cassandra from scratch, vnodes
> would likely be implemented from the beginning. However, I'm concerned that
> implementing them now could be a big distraction from more productive uses
> of all of our time and introduce major potential stability issues into what
> is becoming a business critical piece of infrastructure for many people.
> However, instead of just complaining and pedantry, I'd like to offer a
> feasible alternative:
>
> Has there been consideration given to the idea of a supporting a single
> token range for a node?
>
> While not theoretically as capable as vnodes, it seems to me to be more
> practical as it would have a significantly lower impact on the codebase and
> provides a much clearer migration path. It also seems to solve a majority
> of complaints regarding operational issues with Cassandra clusters.
>
> Each node would have a lower and an upper token, which would form a range
> that would be actively distributed via gossip. Read and replication
> requests would only be routed to a replica when the key of these operations
> matched the replica's token range in the gossip tables. Each node would
> locally store it's own current active token range as well as a target token
> range it's "moving" towards.
>
> As a new node undergoes bootstrap, the bounds would be gradually expanded
> to allow it to handle requests for a wider range of the keyspace as it
> moves towards it's target token range. This idea boils down to a move from
> hard cutovers to smoother operations by gradually adjusting active token
> ranges over a period of time. It would apply to token change operations
> (nodetool 'move' and 'removetoken') as well.
>
> Failure during streaming could be recovered at the bounds instead of
> restarting the whole process as the active bounds would effectively track
> the progress for bootstrap & target token changes. Implicitly these
> operations would be throttled to some degree. Node repair (AES) could also
> be modified using the same overall ideas provide a more gradual impact on
> the cluster overall similar as the ideas given in CASSANDRA-3721.
>
> While this doesn't spread the load over the cluster for these operations
> evenly like vnodes does, this is likely an issue that could be worked
> around by performing concurrent (throttled) bootstrap & node repair (AES)
> operations. It does allow some kind of "active" load balancing, but clearly
> this is not as flexible or as useful as vnodes, but you should be using
> RandomPartitioner or sort-of-randomized keys with OPP right? ;)
>
> As a side note: vnodes fail to provide solutions to node-based limitations
> that seem to me to cause a substantial portion of operational issues such
> as impact of node restarts / upgrades, GC and compaction induced latency. I
> think some progress could be made here by allowing a "pack" of independent
> Cassandra nodes to be ran on a single host; somewhat (but nowhere near
> entirely) similar to a pre-fork model used by some UNIX-based servers.
>
> Input?
>
> --
> Rick Branson
> DataStax
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message