Jonathan,

I agree with your idea about a tool that could 'propose' good token choices for optimal load-balancing.

If I was going to write such a tool: do you think the thrift API provides the necessary information? I think with the RandomPartitioner you cannot scan all your rows to actually find out how big certain ranges of rows are. And even with the OPP (that is the major target for this kind of tool, for sure) you would have to fetch all row's content just to find out how large it is, right?

Greetings,

Roland

25.03.2010 22:28 schrieb am "Jonathan Ellis" <jbellis@gmail.com>:

One problem is if the heaviest node is next to a node that's is

lighter than average, instead of heavier. Then if the new node takes

extra from the heaviest, say 75% instead of just 1/2, and then we take

1/2 of the heaviest's neighbor and put it on the heaviest, you made

that lighter-than-average node even lighter.

Could you move 1/2, 1/4, etc. only until you get to a node lighter

than average? Probably. But I'm not sure if it's a big enough win to

justify the the complexity.

Probably a better solution would be a tool where you tell it "I want

to add N nodes to my cluster, analyzes the load factors and tell me

what tokens to add them with, and what additional moves to make to get

me within M% of equal loads, with the minimum amount of data

movement."

-Jonathan

On Thu, Mar 25, 2010 at 1:52 PM, Jeremy Dunck <jdunck@gmail.com> wrote:

> On Thu, Mar 25, 2010 at 1...