incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Branson <>
Subject Re: RFC: Cassandra Virtual Nodes
Date Tue, 20 Mar 2012 03:17:48 GMT
On Mon, Mar 19, 2012 at 4:45 PM, Peter Schuller
<> wrote:
> > As a side note: vnodes fail to provide solutions to node-based limitations
> > that seem to me to cause a substantial portion of operational issues such
> > as impact of node restarts / upgrades, GC and compaction induced latency. I
> Actually, it does. At least assumign DF > RF (as in the original
> proposal, and mine). The impact of a node suffering from a performance
> degradation is mitigated because the effects are spread out over DF-1
> (N-1 in the original post) nodes instead of just RF nodes.

You've got me on one of those after some re-thought. For any node
outage (an upgrade/restart) definitely has a big impact by distributed
the load more evenly, but (and correct me if I'm wrong) for things
like additional latency caused by GC/compaction, those requests will
just be slower rather than timing out or getting redirected via the
dynamic snitch.

> > think some progress could be made here by allowing a "pack" of independent
> > Cassandra nodes to be ran on a single host; somewhat (but nowhere near
> > entirely) similar to a pre-fork model used by some UNIX-based servers.
> I have pretty significant knee-jerk negative reactions to that idea to
> be honest, even if the pack is limited to a handful of instances. In
> order for vnodes to be useful with random placement, we'd need much
> more than a handful of vnodes per node (cassandra instances in a
> "pack" in that model).

Fair enough, I'm not super fond of the idea personally, but I don't
see a way around the limitations of the current JVM GC without
multiple processes.

After some rethinking my ideas a bit, I think actually what I've
settled a bit more on is to keep the existing node tokens, but add an
additional "active token" that would be used to determine the data
range that a node is ready to receive reads for. This should gain all
of the benefits highlighted in my earlier post, but with less
complexity in implementation. Node repair (AES) would still allow
ranges to be specified.

View raw message