helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <mmo...@apache.org>
Subject Re: Getting auto_rebalance right
Date Tue, 15 Oct 2013 15:06:27 GMT
Hi Kishore,

On Oct 15, 2013, at 16:48 , kishore g <g.kishore@gmail.com> wrote:

> Hi Matthieu,
> 
> I think the code avoids placing more than one replica of a partition on the same node.
So If you have only 1 node, it will not create only LEADERS. We can make add a configuration
and allow this to happen.

Actually, preventing leader and replica for a partition to be on the same node makes sense
: such a placement defeats the purpose of the replica.

> But I do see something weird when you add a node, only 1 additional replica gets created,
that does not make sense. I will take a look at that.

Yes, with 3 nodes and 3 partitions we should expect 3 leaders and 3 replicas. An additional
requirement, related to the above comment, would be that leaders and replica are never colocated.
Should I open a jira for that?

Let me know if you need more feedback.

Thanks!

Matthieu



> 
> thanks,
> Kishore G
> 
> 
> On Tue, Oct 15, 2013 at 1:31 AM, Matthieu Morel <mmorel@apache.org> wrote:
> Thanks for your prompt answers!
> 
> I used the latest version from the master branch and applied the code changes suggested
by Jason.
> 
> The good news are that:
> 	- the update was trivial - at least for the small code example I provided. 
> 	- I always get 3 leaders states for the 3 partitions
> 
> The bad news are that:
> 	- I either don't get enough replica (I want 1 replica for each partition, and initially
I only have replica for 2 partitions) 
> 	- or simply I get no replica at all (after removing 1 node from the cluster, I have
3 leaders, 0 replica)
> 
> I updated my simple example https://github.com/matthieumorel/helix-balancing so you can
reproduce that behavior.
> 
> // with only 1 node, I have 3 leaders, 0 replica :
> 
> Starting instance Node:myhost:10000
> Assigning MY_RESOURCE_1 to Node:myhost:10000
> Assigning MY_RESOURCE_0 to Node:myhost:10000
> Assigning MY_RESOURCE_2 to Node:myhost:10000
> OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_2)
> OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_1)
> OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_0)
> REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_0)
> REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_1)
> REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_2)
> 
> 
> // adding 1 node adds a replica:
> 
> Starting instance Node:myhost:10001
> Assigning MY_RESOURCE_1 to Node:myhost:10001
> OFFLINE -> REPLICA (Node:myhost:10001, MY_RESOURCE_1)
> 
> 
> // adding another node adds a new replica:
> 
> Starting instance Node:myhost:10002
> Assigning MY_RESOURCE_0 to Node:myhost:10002
> OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_0)
> 
> 
> // removing a node rebalances things but we end up with 3 leaders, 0 replica
> 
> Stopping instance Node:myhost:10000
> Assigning MY_RESOURCE_2 to Node:myhost:10002
> REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_0)
> OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_2)
> REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_2)
> REPLICA -> LEADER (Node:myhost:10001, MY_RESOURCE_1)
> 
> 
> I would like to get 1 leader and 1 replica for each partition, regardless of the number
of nodes. Is that possible?
> 
> Thanks!
> 
> Matthieu
> 
> 
> 
> On Oct 15, 2013, at 02:30 , Kanak Biscuitwala <kanak.b@hotmail.com> wrote:
> 
>> Hi Matthieu,
>> 
>> I have just pushed a patch to the master branch (i.e. trunk) that should fix the
issue. Please let me know if the problem persists.
>> 
>> Thanks,
>> Kanak
>> 
>> ________________________________
>>> From: zzhang@linkedin.com 
>>> To: user@helix.incubator.apache.org 
>>> Subject: Re: Getting auto_rebalance right 
>>> Date: Mon, 14 Oct 2013 21:32:41 +0000 
>>> 
>>> Hi Matthieu, this is a known bug in 0.6.1 release. We have fixed it in 
>>> trunk. If you are building from trunk, change ClusterConfigInit#init() 
>>> 
>>> admin.addResource(DEFAULT_CLUSTER_NAME, 
>>> RESOURCE, 
>>> PARTITIONS, 
>>> "LEADER_REPLICA", 
>>> IdealStateModeProperty.AUTO_REBALANCE.toString()); 
>>> to 
>>> 
>>> 
>>> admin.addResource(DEFAULT_CLUSTER_NAME, RESOURCE, PARTITIONS, 
>>> 
>>> "LEADER_REPLICA", 
>>> 
>>> RebalanceMode.FULL_AUTO.toString()); 
>>> 
>>> 
>>> It should work. We are planing to make 0.6.2 release with a few fixes including
this one. 
>>> 
>>> 
>>> Thanks, 
>>> 
>>> Jason 
>>> 
>>> 
>>> From: Matthieu Morel <mmorel@apache.org<mailto:mmorel@apache.org>>

>>> Reply-To: 
>>> "user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>"

>>> <user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>>

>>> Date: Monday, October 14, 2013 12:09 PM 
>>> To: 
>>> "user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>"

>>> <user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org>>

>>> Subject: Getting auto_rebalance right 
>>> 
>>> Hi, 
>>> 
>>> I'm trying to use the auto-rebalance mode in Helix. 
>>> 
>>> The use case is the following (standard leader-standby scenario, a bit 
>>> like the rsync example in the helix codebase): 
>>> - the dataspace is partitioned 
>>> - for a given partition, we have 
>>> - a leader that is responsible for writing and serving data, logging 
>>> operations into a journal 
>>> - a replica that fetches updates from a journal and applies them 
>>> locally but it does not serve data 
>>> Upon failure, the replica becomes leader, applies pending updates and 
>>> can write and serve data. Ideally we also get a new replica assigned. 
>>> 
>>> We'd like to use the auto_rebalance mode in Helix so that partitions 
>>> are automatically assigned and re-assigned, and so that leaders are 
>>> automatically elected. 
>>> 
>>> 
>>> Unfortunately, I can't really get the balancing right. I might be doing 
>>> something wrong, so I uploaded an example here 
>>> : https://github.com/matthieumorel/helix-balancing 
>>> 
>>> 
>>> In this application I would like to get exactly 1 leader and 1 replica 
>>> for each of the partitions 
>>> 
>>> In this example we don't reach that result, and when removing a node, 
>>> we even get to a situation where there is no leader for a given 
>>> partition. 
>>> 
>>> 
>>> Do I have wrong expectations? Is there something wrong with the code, 
>>> is it something with helix? 
>>> 
>>> 
>>> Thanks! 
>>> 
>>> Matthieu 		 	   		  
> 
> 


Mime
View raw message