cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-1418) Automatic, online load balancing
Date Tue, 12 Apr 2011 17:05:06 GMT


Jonathan Ellis updated CASSANDRA-1418:

    Fix Version/s:     (was: 1.0)
           Labels: ponies  (was: )

> Automatic, online load balancing
> --------------------------------
>                 Key: CASSANDRA-1418
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>              Labels: ponies
> h2. Goal
> CASSANDRA-192 began with the intention of implementing full cluster load balancing, but
ended up being (wisely) limited to a manual load balancing operation. This issue is an umbrella
ticket for finishing the job of implementing automatic, always-on load balancing.
> It is possible to implement very efficient load balancing operations with a single process
directing the rebalancing of all nodes, but avoiding such a central process and allowing individual
nodes to make their own movement decisions would be ideal.
> h2. Components
> h3. Optimal movements for individual nodes
> h4. Ruhl
> One such approach is the Ruhl algorithm described on 192:
. But as described, it performs excessive movement for large hotspots, and can take a long
time to reach equilibrium. Consider the following ring:
> ||token||load||
> |a|5|
> |c|5|
> |e|5|
> |f|40|
> |k|5|
> Assuming that node 'a' is the first to discover that 'f' is overloaded: it will apply
Case 2, and assume half of 'f's load by moving to 'i', leaving both with 20 units. But this
is not a optimal movement, because both 'f' and 'a/i' will still be holding data that they
will need to give away. Additionally, 'a/i' can't begin giving the data away until it has
finished receiving it.
> If node 'e' is the first to discover that 'f' is overloaded, it will apply Case 1, and
'f' will give half of its load to 'e' by moving to 'i'. Again, this is a non-optimal movement,
because it will result in both 'e' and 'f/i' holding data that they need to give away.
> h4. Adding load awareness to Ruhl
> Luckily, there appears to be a simple adjustment to the Ruhl algorithm that solves this
problem by taking advantage of the fact that Cassandra knows the total load of a cluster,
and can use it to calculate the average/ideal load ω. Once node j has decided it should take
load from node i (based on the ε value in Ruhl), rather than node j taking 1/2 of the load
on node i, it should chose a token such that either i or j ends up with a load within ε*ω
of ω.
> Again considering the ring described above, and assuming ε == 1.0, the total load for
the 5 nodes is 60, giving a ω of 12. If node 'a' is the first to discover 'f', it will choose
to move to 'j' (a token that takes 12 or ω load units from 'f'), leaving 'f' with a load
of 28. When combined with the improvement in the next section, this is closer to being an
optimal movement, because 'a/j' will at worst have ε*ω of load to give away, and 'f' is
in a position to start more movements.
> h3. Automatic load balancing
> Since the Ruhl algorithm only requires a node to make a decision based on itself and
one other node, it should be relatively straightforward to add a timer on each node that periodically
wakes up and executes the modifiied Ruhl algorithm if it is not already in the process of
moving (based on pending ranges).
> Automatic balancing should probably be enabled by default, and should have a configurable
per-node bandwidth cap.
> h3. Allowing concurrent moves on a node
> Allowing a node to give away multiple ranges at once allows for the type of quick balancing
that is typically only attributed to vnodes. If a node is a hotspot, such as in the example
above, the node should be able to quickly dump the load in a manner that causes minimal load
on the rest of the cluster. Rather than transferring to 1 target at 10 MB/s, a hotspot can
give to 5 targets at 2 MB/s each.

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message