helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ming Fang <mingf...@mac.com>
Subject Re: Potential bug in manual partition placement
Date Fri, 22 Feb 2013 12:18:25 GMT
Thanks for the detailed explanation.
I've learned a lot as a result. 

Sent from my iPad

On Feb 22, 2013, at 1:12 AM, kishore g <g.kishore@gmail.com> wrote:

> Hi Ming,
> 
> It is easier to understand if you look at the transition order in the first email you
sent.
> localhost_12000 transitioning from OFFLINE to SLAVE for MyResource_0
> localhost_12002 transitioning from OFFLINE to SLAVE for MyResource_1
> localhost_12000 transitioning from OFFLINE to SLAVE for MyResource_1
> localhost_12002 transitioning from OFFLINE to SLAVE for MyResource_0
> 
> If you see Helix at this point Helix has not sent any transition from OFFLINE to SLAVE
to localhost_12001, this is because you have set the constraint that max number of nodes that
can be in SLAVE state is 2 ( replicas=2) in the state model definition.
> 
> For MyResource_1 Since localhost_12002 and localhost_12000 are already slave, localhost_12001
can never become a slave since that would violate the constraint of slave <2. Since it
cannot become slave, it cannot become Master.
> 
> For MyResource_0, you can see that it first made localhost_12000 master and hence it
could send message to localhost_12001 to become Slave. 
> 
> localhost_12000 transitioning from SLAVE to MASTER for MyResource_0
> localhost_12001 transitioning from OFFLINE to SLAVE for MyResource_0
> 
> Helix-50 fixes the random selection of nodes to sort messages based on the preference
list.
> 
> Thanks,
> Kishore G
> 
> 
> On Thu, Feb 21, 2013 at 4:42 PM, Zhen Zhang <nehzgnahz@gmail.com> wrote:
>> Hi Ming, thanks the feedback. With REPLICAS set to 2, it's a random behavior that
Helix controller will pick up any two of the hosts in the preference list and do the transitions.
In your case it happens that it will work fine. We have updated the jira accordingly and will
fix it soon.
>> https://issues.apache.org/jira/browse/HELIX-50
>> 
>> Thanks,
>> Zhen
>> 
>> On Thu, Feb 21, 2013 at 4:34 PM, Ming Fang <mingfang@mac.com> wrote:
>>> Thanks for pointing that out.
>>> It does work was expected after I set REPLICAS to 3.
>>> 
>>> But the strange thing is even with REPLICAS set to 2 and placement configured
as below, everything works.
>>>         "MyResource_0" : [ "localhost_12000", "localhost_12001", "localhost_12002"
],
>>>         "MyResource_1" : [ "localhost_12000", "localhost_12001", "localhost_12002"
]
>>> 
>>> On Feb 20, 2013, at 2:02 AM, kishore g <g.kishore@gmail.com> wrote:
>>> 
>>>> https://github.com/mingfang/apache-helix/blob/master/helix-core/src/main/resources/manual.json
has replicas set to 2 but the preference list for each partition is of size 3. If you set
the number of REPLICAS to 3, it should work. 
>>>> 
>>>> We do some validation of the idealstate but we dont validate that number
of replicas is same as the preference list size for all partitions. Created JIRA https://issues.apache.org/jira/browse/HELIX-50
>>>> 
>>>> 
>>>> Thanks,
>>>> Kishore G  
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, Feb 19, 2013 at 7:08 PM, Ming Fang <mingfang@mac.com> wrote:
>>>>> I've "repurpose" the Quickstart example in an attempt to implement manual
placement of partitions.
>>>>> I'm using JSON file and the relevant section is below
>>>>> 
>>>>>         "MyResource_0" : [ "localhost_12000", "localhost_12001", "localhost_12002"
],
>>>>>         "MyResource_1" : [ "localhost_12001", "localhost_12000", "localhost_12002"
]
>>>>> 
>>>>> The goal is to make _12000 the MASTER for MyResource_0 and _12001 the
MASTER of MyResource_1.
>>>>> The last instance, _12002 will serve as the last resort backup for both
partitions in the event the other two died.
>>>>> This is a small example of what I was hoping to implement as part of
a larger system.
>>>>> 
>>>>> You may run the example here
>>>>> https://github.com/mingfang/apache-helix/blob/master/helix-core/src/main/java/org/apache/helix/examples/ManualPlacementTest.java
>>>>> 
>>>>> using the JSON file here
>>>>> https://github.com/mingfang/apache-helix/blob/master/helix-core/src/main/resources/manual.json
>>>>> 
>>>>> The problem is when I run this, the output looks like this
>>>>> 
>>>>> STARTING Zookeeper at localhost:2199
>>>>> Creating cluster: HELIX_QUICKSTART
>>>>> Adding 3 participants to the cluster
>>>>>          Added participant: localhost_12000
>>>>>          Added participant: localhost_12001
>>>>>          Added participant: localhost_12002
>>>>> Starting Participants
>>>>>          Started Participant: localhost_12000
>>>>>          Started Participant: localhost_12001
>>>>>          Started Participant: localhost_12002
>>>>> Starting Helix Controller
>>>>> localhost_12000 transitioning from OFFLINE to SLAVE for MyResource_0
>>>>> localhost_12002 transitioning from OFFLINE to SLAVE for MyResource_1
>>>>> localhost_12000 transitioning from OFFLINE to SLAVE for MyResource_1
>>>>> localhost_12002 transitioning from OFFLINE to SLAVE for MyResource_0
>>>>> localhost_12000 transitioning from SLAVE to MASTER for MyResource_0
>>>>> localhost_12001 transitioning from OFFLINE to SLAVE for MyResource_0
>>>>> CLUSTER STATE: After starting 3 nodes
>>>>>                 localhost_12000 localhost_12001 localhost_12002
>>>>>         MyResource_0    M               S               S
>>>>>         MyResource_1    S               -               S
>>>>> ###################################################################
>>>>> 
>>>>> Notice there is no MASTER for MyResource_1.
>>>>> I've been trying to debug this for a day now with no success.
>>>>> 
>>>>> Did I stumble onto an actual bug?
>>>> 
>>> 
>> 
> 

Mime
View raw message