helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: Getting auto_rebalance right
Date Tue, 15 Oct 2013 18:48:13 GMT
Yes, we will also add some additional info on ERROR and DROPPED state here
http://helix.incubator.apache.org/tutorial_state.html

Here is the jira https://issues.apache.org/jira/browse/HELIX-144

Kyle came up with the validation check since he also found similar issues
but unfortunately its in Closure, but porting it to Java should be trivial


On Tue, Oct 15, 2013 at 11:23 AM, Matthieu Morel <mmorel@apache.org> wrote:

> OK so it seems I was missing the declaration of the "DROPPED" state.
>
> I added that and things look OK now.
>
> So it seems the issue was mostly related to correctly defining the state
> model, and I'm glad you are adding checks for that.
>
> Thanks!
>
> Matthieu
>
>
> On Oct 15, 2013, at 19:50 , Matthieu Morel <mmorel@apache.org> wrote:
>
> Thanks Kanak,
>
> Note that it's also probably wrong that with a replication factor of 1 as
> well because we actually get both leaders and replica, when we should only
> get 3 leaders.  Probably related to the other issue.
>
> E.g. in my example, with 3 nodes deployed, 3 partitions, replication
> factor 1, we actually get 3 leaders and 2 replicas:
>
> MY_RESOURCE_0: {
> Node:myhost:10000: "LEADER",
> Node:myhost:10002: "REPLICA"
> },
> MY_RESOURCE_1: {
> Node:myhost:10000: "LEADER",
> Node:myhost:10001: "REPLICA"
> },
> MY_RESOURCE_2: {
> Node:myhost:10000: "LEADER"
> }
>
> Regards,
>
> Matthieu
>
> On Oct 15, 2013, at 19:39 , Kanak Biscuitwala <kbiscuitwala@linkedin.com>
> wrote:
>
>  Hi Matthieu,
>
>  Please change line 39 in ClusterConfigInit to:
>
>  admin.rebalance(DEFAULT_CLUSTER_NAME, RESOURCE, 2);
>
>  Basically, the leader counts as a replica, so if you want a replica in
> addition to the leader, you need to specify 2 for the replica count.
>
>  There is a bug where when there are 3 nodes, I see partition 0 has 2 in
> REPLICA state even though one of them should be dropped. I'll keep
> investigating that.
>
>  Kanak
>
>   From: Matthieu Morel <mmorel@apache.org>
> Reply-To: "user@helix.incubator.apache.org" <
> user@helix.incubator.apache.org>
> Date: Tuesday, October 15, 2013 8:06 AM
> To: "user@helix.incubator.apache.org" <user@helix.incubator.apache.org>
> Subject: Re: Getting auto_rebalance right
>
>   Hi Kishore,
>
>  On Oct 15, 2013, at 16:48 , kishore g <g.kishore@gmail.com> wrote:
>
>  Hi Matthieu,
>
>  I think the code avoids placing more than one replica of a partition on
> the same node. So If you have only 1 node, it will not create only LEADERS.
> We can make add a configuration and allow this to happen.
>
>
>  Actually, preventing leader and replica for a partition to be on the
> same node makes sense : such a placement defeats the purpose of the replica.
>
>  But I do see something weird when you add a node, only 1 additional
> replica gets created, that does not make sense. I will take a look at that.
>
>
>  Yes, with 3 nodes and 3 partitions we should expect 3 leaders and 3
> replicas. An additional requirement, related to the above comment, would be
> that leaders and replica are never colocated. Should I open a jira for that?
>
>  Let me know if you need more feedback.
>
>  Thanks!
>
>  Matthieu
>
>
>
>
>  thanks,
> Kishore G
>
>
> On Tue, Oct 15, 2013 at 1:31 AM, Matthieu Morel <mmorel@apache.org> wrote:
>
>> Thanks for your prompt answers!
>>
>>  I used the latest version from the master branch and applied the code
>> changes suggested by Jason.
>>
>>  The good news are that:
>> - the update was trivial - at least for the small code example I
>> provided.
>> - I always get 3 leaders states for the 3 partitions
>>
>>  The bad news are that:
>> - I either don't get enough replica (I want 1 replica for each partition,
>> and initially I only have replica for 2 partitions)
>> - or simply I get no replica at all (after removing 1 node from the
>> cluster, I have 3 leaders, 0 replica)
>>
>>  I updated my simple example
>> https://github.com/matthieumorel/helix-balancing so you can reproduce
>> that behavior.
>>
>>  // with only 1 node, I have 3 leaders, 0 replica :
>>
>>  Starting instance Node:myhost:10000
>> Assigning MY_RESOURCE_1 to Node:myhost:10000
>> Assigning MY_RESOURCE_0 to Node:myhost:10000
>> Assigning MY_RESOURCE_2 to Node:myhost:10000
>> OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_2)
>> OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_1)
>> OFFLINE -> REPLICA (Node:myhost:10000, MY_RESOURCE_0)
>> REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_0)
>> REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_1)
>> REPLICA -> LEADER (Node:myhost:10000, MY_RESOURCE_2)
>>
>>
>>  // adding 1 node adds a replica:
>>
>>  Starting instance Node:myhost:10001
>> Assigning MY_RESOURCE_1 to Node:myhost:10001
>> OFFLINE -> REPLICA (Node:myhost:10001, MY_RESOURCE_1)
>>
>>
>>  // adding another node adds a new replica:
>>
>>  Starting instance Node:myhost:10002
>> Assigning MY_RESOURCE_0 to Node:myhost:10002
>> OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_0)
>>
>>
>>  // removing a node rebalances things but we end up with 3 leaders, 0
>> replica
>>
>>  Stopping instance Node:myhost:10000
>> Assigning MY_RESOURCE_2 to Node:myhost:10002
>> REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_0)
>> OFFLINE -> REPLICA (Node:myhost:10002, MY_RESOURCE_2)
>> REPLICA -> LEADER (Node:myhost:10002, MY_RESOURCE_2)
>> REPLICA -> LEADER (Node:myhost:10001, MY_RESOURCE_1)
>>
>>
>>  I would like to get 1 leader and 1 replica for each partition,
>> regardless of the number of nodes. Is that possible?
>>
>>  Thanks!
>>
>>  Matthieu
>>
>>
>>
>>  On Oct 15, 2013, at 02:30 , Kanak Biscuitwala <kanak.b@hotmail.com>
>> wrote:
>>
>> Hi Matthieu,
>>
>> I have just pushed a patch to the master branch (i.e. trunk) that should
>> fix the issue. Please let me know if the problem persists.
>>
>> Thanks,
>> Kanak
>>
>> ________________________________
>>
>> From: zzhang@linkedin.com
>> To: user@helix.incubator.apache.org
>> Subject: Re: Getting auto_rebalance right
>> Date: Mon, 14 Oct 2013 21:32:41 +0000
>>
>> Hi Matthieu, this is a known bug in 0.6.1 release. We have fixed it in
>> trunk. If you are building from trunk, change ClusterConfigInit#init()
>>
>> admin.addResource(DEFAULT_CLUSTER_NAME,
>> RESOURCE,
>> PARTITIONS,
>> "LEADER_REPLICA",
>> IdealStateModeProperty.AUTO_REBALANCE.toString());
>> to
>>
>>
>> admin.addResource(DEFAULT_CLUSTER_NAME, RESOURCE, PARTITIONS,
>>
>> "LEADER_REPLICA",
>>
>> RebalanceMode.FULL_AUTO.toString());
>>
>>
>> It should work. We are planing to make 0.6.2 release with a few fixes
>> including this one.
>>
>>
>> Thanks,
>>
>> Jason
>>
>>
>> From: Matthieu Morel <mmorel@apache.org<mailto:mmorel@apache.org<mmorel@apache.org>>>
>>
>> Reply-To:
>> "user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org<user@helix.incubator.apache.org>>"
>>
>> <user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org<user@helix.incubator.apache.org>>>
>>
>> Date: Monday, October 14, 2013 12:09 PM
>> To:
>> "user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org<user@helix.incubator.apache.org>>"
>>
>> <user@helix.incubator.apache.org<mailto:user@helix.incubator.apache.org<user@helix.incubator.apache.org>>>
>>
>> Subject: Getting auto_rebalance right
>>
>> Hi,
>>
>> I'm trying to use the auto-rebalance mode in Helix.
>>
>> The use case is the following (standard leader-standby scenario, a bit
>> like the rsync example in the helix codebase):
>> - the dataspace is partitioned
>> - for a given partition, we have
>> - a leader that is responsible for writing and serving data, logging
>> operations into a journal
>> - a replica that fetches updates from a journal and applies them
>> locally but it does not serve data
>> Upon failure, the replica becomes leader, applies pending updates and
>> can write and serve data. Ideally we also get a new replica assigned.
>>
>> We'd like to use the auto_rebalance mode in Helix so that partitions
>> are automatically assigned and re-assigned, and so that leaders are
>> automatically elected.
>>
>>
>> Unfortunately, I can't really get the balancing right. I might be doing
>> something wrong, so I uploaded an example here
>> : https://github.com/matthieumorel/helix-balancing
>>
>>
>> In this application I would like to get exactly 1 leader and 1 replica
>> for each of the partitions
>>
>> In this example we don't reach that result, and when removing a node,
>> we even get to a situation where there is no leader for a given
>> partition.
>>
>>
>> Do I have wrong expectations? Is there something wrong with the code,
>> is it something with helix?
>>
>>
>> Thanks!
>>
>> Matthieu
>>
>>
>>
>
>
>
>

Mime
View raw message