incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: RFC: Cassandra Virtual Nodes
Date Tue, 20 Mar 2012 14:55:11 GMT
On Tue, Mar 20, 2012 at 9:08 AM, Eric Evans <> wrote:
> On Tue, Mar 20, 2012 at 8:39 AM, Jonathan Ellis <> wrote:
>> I like this idea.  It feels like a good 80/20 solution -- 80% of the
>> benefits, 20% of the effort.  More like 5% of the effort.  I can't
>> even enumerate all the places full vnode support would change, but an
>> "active token range" concept would be relatively limited in scope.
> It only addresses 1 of Sam's original 5 points, so I wouldn't call it
> an "80% solution".

I guess a more accurate way to put this is, "only 20% of Sam's list is
an actual pain point that doesn't get addressed by The Rick Proposal

Here's how I see Sam's list:

* Even load balancing when growing and shrinking the cluster

Nice to have, but post-bootstrap load balancing works well in practice
(and is improved by TRP).

* Greater failure tolerance in streaming

Directly addressed by TRP.

* Evenly distributed impact of streaming operations

Not a problem in practice with stream throttling.

* Possibility for active load balancing

Not really a feature of vnodes per se, but as with the other load
balancing point, this is also improved by TRP.

* Distributed rebuild

This is the 20% that TRP does not address.  Nice to have?  Yes.  Can I
live without it?  I have so far.  Is this alone worth the complexity
of vnodes?  No, it is not.  Especially since there are probably other
approaches that we can take to mitigate this, one of which Rick has
suggested in a separate sub-thread.

>> Full vnodes feels a lot more like the counters quagmire, where
>> Digg/Twitter worked on it for... 8? months, and then DataStax worked
>> on it about for about 6 months post-commit, and we're still finding
>> the occasional bug-since-0.7 there.  With the benefit of hindsight, as
>> bad as maintaining that patchset was out of tree, committing it as
>> early as we did was a mistake.  We won't do that again.  (On the
>> bright side, git makes maintaining such a patchset easier now.)
> And yet counters have become a very important feature for Cassandra;
> We're better off with them, than without.

False dichotomy (we could have waited for a better counter design),
but that's mostly irrelevant to my point that jamming incomplete code
in-tree to sort out later is a bad idea.

> I think there were a number of problems with how counters went down
> that could be avoided here.  For one, we can take a phased,
> incremental approach, rather than waiting 8 months to drop a large
> patchset.

If there are incremental improvements to be made that justify
themselves independently, then I agree.  Small, self-contained steps
are a good thing.  A good example is, a product of The
Grand Storage Engine Redesign of 674 fame.

But, when things don't naturally break down into such mini-features,
then I'm -1 on committing code that has no purpose other than to be a
foundation for later commits.  I've seen people get bored or assigned
to other projects too often to just trust that those later commits
will indeed be forthcoming.  Or even if Sam [for instance] is still
working hard on it, it's very easy for unforseen difficulties to come
up that invalidate the original approach.  Since we were talking about
counters, the original vector clock approach -- that we ended up
ripping out, painfully -- is a good example.  Once bitten, twice shy.

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support

View raw message