helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <mmo...@apache.org>
Subject Re: Getting auto_rebalance right
Date Tue, 15 Oct 2013 08:31:47 GMT
Thanks for your prompt answers!

I used the latest version from the master branch and applied the code changes suggested by
Jason.

The good news are that:
	- the update was trivial - at least for the small code example I provided. 
	- I always get 3 leaders states for the 3 partitions

The bad news are that:
	- I either don't get enough replica (I want 1 replica for each partition, and initially I
only have replica for 2 partitions) 
	- or simply I get no replica at all (after removing 1 node from the cluster, I have 3 leaders,
0 replica)

I updated my simple example https://github.com/matthieumorel/helix-balancing so you can reproduce
that behavior.

// with only 1 node, I have 3 leaders, 0 replica :

Starting instance Node:myhost:10000
Assigning MY_RESOURCE_1 to Node:myhost:10000
Assigning MY_RESOURCE_0 to Node:myhost:10000
Assigning MY_RESOURCE_2 to Node:myhost:10000
OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_2)
OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_1)
OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_0)
REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_0)
REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_1)
REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_2)


// adding 1 node adds a replica:

Starting instance Node:myhost:10001
Assigning MY_RESOURCE_1 to Node:myhost:10001
OFFLINE -> REPLICA (Node:myhost:10001, MY_RESOURCE_1)


// adding another node adds a new replica:

Starting instance Node:myhost:10002
Assigning MY_RESOURCE_0 to Node:myhost:10002
OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_0)


// removing a node rebalances things but we end up with 3 leaders, 0 replica

Stopping instance Node:myhost:10000
Assigning MY_RESOURCE_2 to Node:myhost:10002
REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_0)
OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_2)
REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_2)
REPLICA -> LEADER (Node:myhost:10001, MY_RESOURCE_1)


I would like to get 1 leader and 1 replica for each partition, regardless of the number of
nodes. Is that possible?

Thanks!

Matthieu



On Oct 15, 2013, at 02:30 , Kanak Biscuitwala <kanak.b@hotmail.com> wrote:

> Hi Matthieu,
> 
> I have just pushed a patch to the master branch (i.e. trunk) that should fix the issue.
Please let me know if the problem persists.
> 
> Thanks,
> Kanak
> 
> ________________________________
>> From: zzhang@linkedin.com 
>> To: user@helix.incubator.apache.org 
>> Subject: Re: Getting auto_rebalance right 
>> Date: Mon, 14 Oct 2013 21:32:41 +0000 
>> 
>> Hi Matthieu, this is a known bug in 0.6.1 release. We have fixed it in 
>> trunk. If you are building from trunk, change ClusterConfigInit#init() 
>> 
>> admin.addResource(DEFAULT_CLUSTER_NAME, 
>> RESOURCE, 
>> PARTITIONS, 
>> "LEADER_REPLICA", 
>> IdealStateModeProperty.AUTO_REBALANCE.toString()); 
>> to 
>> 
>> 
>> admin.addResource(DEFAULT_CLUSTER_NAME, RESOURCE, PARTITIONS, 
>> 
>> "LEADER_REPLICA", 
>> 
>> RebalanceMode.FULL_AUTO.toString()); 
>> 
>> 
>> It should work. We are planing to make 0.6.2 release with a few fixes including this
one. 
>> 
>> 
>> Thanks, 
>> 
>> Jason 
>> 
>> 
>> From: Matthieu Morel <mmorel@apache.org<mailto:mmorel@apache.org>> 
>> Reply-To: 
>> "user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>" 
>> <user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>>

>> Date: Monday, October 14, 2013 12:09 PM 
>> To: 
>> "user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>" 
>> <user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>>

>> Subject: Getting auto_rebalance right 
>> 
>> Hi, 
>> 
>> I'm trying to use the auto-rebalance mode in Helix. 
>> 
>> The use case is the following (standard leader-standby scenario, a bit 
>> like the rsync example in the helix codebase): 
>> - the dataspace is partitioned 
>> - for a given partition, we have 
>> - a leader that is responsible for writing and serving data, logging 
>> operations into a journal 
>> - a replica that fetches updates from a journal and applies them 
>> locally but it does not serve data 
>> Upon failure, the replica becomes leader, applies pending updates and 
>> can write and serve data. Ideally we also get a new replica assigned. 
>> 
>> We'd like to use the auto_rebalance mode in Helix so that partitions 
>> are automatically assigned and re-assigned, and so that leaders are 
>> automatically elected. 
>> 
>> 
>> Unfortunately, I can't really get the balancing right. I might be doing 
>> something wrong, so I uploaded an example here 
>> : https://github.com/matthieumorel/helix-balancing 
>> 
>> 
>> In this application I would like to get exactly 1 leader and 1 replica 
>> for each of the partitions 
>> 
>> In this example we don't reach that result, and when removing a node, 
>> we even get to a situation where there is no leader for a given 
>> partition. 
>> 
>> 
>> Do I have wrong expectations? Is there something wrong with the code, 
>> is it something with helix? 
>> 
>> 
>> Thanks! 
>> 
>> Matthieu 		 	   		  


Mime
View raw message