cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: RFC: Cassandra Virtual Nodes
Date Wed, 21 Mar 2012 14:24:37 GMT
On Wed, Mar 21, 2012 at 9:50 AM, Eric Evans <> wrote:
> On Tue, Mar 20, 2012 at 9:53 PM, Jonathan Ellis <> wrote:
>> It's reasonable that we can attach different levels of importance to
>> these things.  Taking a step back, I have two main points:
>> 1) vnodes add enormous complexity to *many* parts of Cassandra.  I'm
>> skeptical of the cost:benefit ratio here.
>> 1a) The benefit is lower in my mind because many of the problems
>> solved by vnodes can be solved "well enough" for "most people," for
>> some value of those two phrases, without vnodes.
>> 2) I'm not okay with a "commit something half-baked and sort it out
>> later" approach.
> I must admit I find this a little disheartening.  The discussion has
> barely started.  No one has had a chance to discuss implementation
> specifics so that the rest of us could understand *how* disruptive it
> would be (a necessary requirement in weighing cost:benefit), or what
> an incremental approach would look like, and yet work has already
> begun on shutting this down.
> Unless I'm reading you wrong, your mandate (I say mandate because you
> hinted at a veto elsewhere), is No to anything complex or invasive
> (for some value of each).  The only alternative would seem to be a
> phased or incremental approach, but you seem to be saying No to that
> as well.
> There seems to be quite a bit of interest in having virtual nodes (and
> there has been for as long as I can remember), the only serious
> reservations relate to the difficulty/complexity.  Is there really no
> way to put our heads together and figure out how to properly manage
> that aspect?
>> On Tue, Mar 20, 2012 at 11:10 AM, Richard Low <> wrote:
>>> On 20 March 2012 14:55, Jonathan Ellis <> wrote:
>>>> Here's how I see Sam's list:
>>>> * Even load balancing when growing and shrinking the cluster
>>>> Nice to have, but post-bootstrap load balancing works well in practice
>>>> (and is improved by TRP).
>>> Post-bootstrap load balancing without vnodes necessarily streams more
>>> data than is necessary.  Vnodes streams the minimal amount.
>>> In fact, post-bootstrap load balancing currently streams a constant
>>> fraction of your data - the network traffic involved in a rebalance
>>> increases linearly with the size of your cluster.  With vnodes it
>>> decreases linearly.
>>> Including removing the ops overhead of running the load balance and
>>> calculating new tokens, this makes removing post-bootstrap load
>>> balancing a pretty big deal.
>>>> * Greater failure tolerance in streaming
>>>> Directly addressed by TRP.
>>> Agreed.
>>>> * Evenly distributed impact of streaming operations
>>>> Not a problem in practice with stream throttling.
>>> Throttling slows them down, increasing rebuild times so increasing downtime.
>>>> * Possibility for active load balancing
>>>> Not really a feature of vnodes per se, but as with the other load
>>>> balancing point, this is also improved by TRP.
>>> Again with the caveat that more data is streamed with TRP.  Vnodes
>>> removes the need for any load balancing with RP.
>>>> * Distributed rebuild
>>>> This is the 20% that TRP does not address.  Nice to have?  Yes.  Can I
>>>> live without it?  I have so far.  Is this alone worth the complexity
>>>> of vnodes?  No, it is not.  Especially since there are probably other
>>>> approaches that we can take to mitigate this, one of which Rick has
>>>> suggested in a separate sub-thread.
>>> Distributed rebuild means you can store more data per node with the
>>> same failure probabilities.  This is frequently a limiting factor on
>>> how much data you can store per node, increasing cluster sizes
>>> unnecessarily.  I'd argue that this alone is worth the complexity of
>>> vnodes.
>>> Richard.
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
> --
> Eric Evans
> Acunu | | @acunu

I have also thought of how I would like Vnodes to work from an
operational prospective rather then a software one. I would like these
1) No more raid 0. If a machine is responsible for 4 vnodes they
should correspond to for JBOD.

2) Vnodes should be able to be hot pluged. My normal cassandra chassis
would be a 2U with 6 drive bays. Imagine I have 10 nodes. Now if my
chassis dies I should be able to take the disks out and physically
plug them into another chassis. Then in cassandra I should be able to
run a command like.
nodetool attach '/mnt/disk6'. disk6 should contain all data an it's
vnode information.

Now this would be awesome for upgrades/migrations/etc.

View raw message