cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Low <>
Subject Re: Why so many vnodes?
Date Tue, 11 Jun 2013 09:05:24 GMT
On 11 June 2013 09:54, Theo Hultberg <> wrote:

But in the paragraph just before Richard said that finding the node that
> owns a token becomes slower on large clusters with lots of token ranges, so
> increasing it further seems contradictory.

I do mean increase for larger clusters, but I guess it depends on what you
are optimizing for.  If you care about maintaining an even load, where
differences are measured relative to the amount of data each node has, then
you need T >> N.

However, you're right, this can slow down some operations.  Repair has a
fixed cost for each token so gets a bit slower with higher T.  Finding
which node owns a range gets harder with T but this code was optimized so I
don't think it will become a practical issue.

Is this a correct interpretation: finding the node that owns a particular
> token becomes slower as the number of nodes (and therefore total token
> ranges) increases, but for large clusters you also need to take the time
> for bootstraps into account, which will become slower if each node has
> fewer token ranges. The speed referred to in the two cases are the speeds
> of different operations, and there will be a trade off, and 256 initial
> tokens is a trade off that works for most cases.

Yes this is right.  The bootstraps may become slower because the node is
streaming from fewer original nodes (although it may only show on very busy
clusters, since otherwise bootstrap is limited by the joining node).  But
more importantly I think is that new nodes won't take an even share of the
data if T is too small.


View raw message