I agree with your idea about a tool that could 'propose' good token choices for optimal load-balancing.

If I was going to write such a tool: do you think the thrift API provides the necessary information? I think with the RandomPartitioner you cannot scan all your rows to actually find out how big certain ranges of rows are. And even with the OPP (that is the major target for this kind of tool, for sure) you would have to fetch all row's content just to find out how large it is, right?


25.03.2010 22:28 schrieb am "Jonathan Ellis" <>:

One problem is if the heaviest node is next to a node that's is
lighter than average, instead of heavier.  Then if the new node takes
extra from the heaviest, say 75% instead of just 1/2, and then we take
1/2 of the heaviest's neighbor and put it on the heaviest, you made
that lighter-than-average node even lighter.

Could you move 1/2, 1/4, etc. only until you get to a node lighter
than average?  Probably.  But I'm not sure if it's a big enough win to
justify the the complexity.

Probably a better solution would be a tool where you tell it "I want
to add N nodes to my cluster, analyzes the load factors and tell me
what tokens to add them with, and what additional moves to make to get
me within M% of equal loads, with the minimum amount of data


On Thu, Mar 25, 2010 at 1:52 PM, Jeremy Dunck <> wrote:
> On Thu, Mar 25, 2010 at 1...