cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manu Zhang <owenzhang1...@gmail.com>
Subject Re: distribution of token ranges with virtual nodes
Date Fri, 02 Nov 2012 03:05:39 GMT
> it will migrate you to virtual nodes by splitting the existing partition
> 256 ways.


Out of curiosity, is it for the purpose of avoiding streaming?

 the former would require you to perform a shuffle to achieve that.


Is there a nodetool option or are there other ways "shuffle" could be done
automatically?


On Thu, Nov 1, 2012 at 2:17 AM, Eric Evans <eevans@acunu.com> wrote:

> On Wed, Oct 31, 2012 at 11:38 AM, John Sanda <john.sanda@gmail.com> wrote:
> > Can/should i assume that i will get even range distribution or close to
> it with random
> > token selection?
>
> The short answer is: If you're using virtual nodes, random token
> selection will give you even range distribution.
>
> The somewhat longer answer is that this is really a function of the
> total number of tokens.  The more randomly generated tokens a cluster
> has, the more distribution will even out.  The reason this can work
> for virtual nodes where it has not for the older 1-token-per-node
> model is because (assuming a reasonable num_tokens value), virtual
> nodes gives you a much higher token count for a given number of nodes.
>
> That wiki page you cite wasn't really intended to be documentation
> (expect some of that soon though), but what that section was trying to
> convey was that while random distribution is quite good, it may not be
> 100% perfect, especially when the number of nodes is low (remember,
> the number of tokens scales with the number of nodes).  I think this
> is (or may be) a problem for some.  If you're forced to manually
> calculate tokens then you are quite naturally going to calculate a
> perfect distribution, and if you've grown accustomed to this, seeing
> the ownership values off by a few percent could really bring out your
> inner OCD. :)
>
> > For the sake of discussion, what is a reasonable default to start
> > with for num_tokens assuming nodes are homogenous? That wiki page
> mentions a
> > default of 256 which I see commented out in cassandra.yaml; however,
> > Config.num_tokens is set to 1.
>
> The (unconfigured )default is 1.  That is to say that virtual nodes is
> not enabled.  The current recommendation when setting this,
> (documented in the config) is 256.
>
> > Maybe I missed where the default of 256 is
> > used. From some initial testing though, it looks like 1 token per node is
> > being used. Using defaults in cassandra.yaml, I see this in my logs,
>
> Right.  And it's worth noting that if you uncomment num_tokens *after*
> starting a node with it commented (i.e. num_tokens: 1), then it will
> migrate you to virtual nodes by splitting the existing partition 256
> ways.  This is *not* the equivalent of starting a node with num_tokens
> = 256 for the first time.  The latter would leave you with randomized
> placement, the former would require you to perform a shuffle to
> achieve that.
>
>
>
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu
>

Mime
View raw message