helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hang Qi <hangq.1...@gmail.com>
Subject Re: Message throttling of controller behavior unexpectedly when there are multiple constraints
Date Mon, 18 May 2015 04:46:32 GMT
Hi Kishore,

Thanks, should I go ahead to create a JIRA issue, and to add a test case
and propose a patch for the fix?

Thanks
Hang Qi

On Sun, May 17, 2015 at 6:19 AM, kishore g <g.kishore@gmail.com> wrote:

> Got it, that should be fixed. Would be great to get a patch to fix it.
> Good find.
>
> Thanks
> Kishore G
> On May 16, 2015 11:50 PM, "Hang Qi" <hangq.1985@gmail.com> wrote:
>
>> Hi Kishore,
>>
>> Thanks for your reply.
>>
>> I am not saying I want Offline->Slave higher priority than Slave->Master.
>> I agree with you, one master is more important than two slaves, and that
>> one only applies to one partition. What I am saying is during p0, p1, p2
>> Offline->Slave transition on node A, I also want p3, p4, p5 performing
>> Offline->Slave transition on node B at the same time, but not wait until
>> p0, p1, p2 becomes Master on node A, there begins to have partition
>> transition on node B, that's kind of waste here.
>>
>> The reason to have one transition per partition at a time is summarized
>> in following thread.
>>
>> http://mail-archives.apache.org/mod_mbox/helix-user/201503.mbox/%3CCAJ2%3DoXxBWF1VoCm%3DjjyhuFCWHuxw3wYPotGz8VRkEnzVhrmgwQ%40mail.gmail.com%3E
>>
>> Thanks
>> Hang Qi
>>
>> On Sat, May 16, 2015 at 8:23 PM, kishore g <g.kishore@gmail.com> wrote:
>>
>>> Thanks Hang for the detailed explanation.
>>>
>>> Before the MessageSelectionStage, there is a stage that orders the
>>> messages according to the state transition priority list. I think
>>> Slave-Master is always higher priority than offline-slave which makes sense
>>> because in general having a master is probably more important than two
>>> slaves.
>>>
>>> Can you provide the state transition priority list in your state model
>>> definition. If you think that its important to get node B to Slave state
>>> before promoting node A from Slave to Master, you can change the priority
>>> order. Note: this can be changed dynamically and does not require re
>>> starting the servers.
>>>
>>> Another question is what is the reason to have constraint #2 i.e only
>>> one transition per partition at a time.
>>>
>>> thanks,
>>> Kishore G
>>>
>>>
>>>
>>> On Sat, May 16, 2015 at 4:48 PM, Hang Qi <hangq.1985@gmail.com> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> We found a very strange behavior on message throttling of controller
>>>> when there is multiple constraints. Here is our setup ( we are using
>>>> helix-0.6.4, only one resource )
>>>>
>>>>    - constraint 1: per node constraint, we only allow 3 state
>>>>    transitions happens on one node concurrently.
>>>>    - constraint 2: per partition constraint, we define the state
>>>>    transition priorities in the state model, and only allow one state
>>>>    transition happens on one single partition concurrently.
>>>>
>>>> We are using MasterSlave state model, suppose we have two nodes A, B,
>>>> each has 8 partitions (p0-p7) respectively, and initially both A and B are
>>>> shutdown, and now we start them at the same time (say A is slightly earlier
>>>> than B).
>>>>
>>>> The expected behavior might be
>>>>
>>>>    1. p0, p1, p2 on A starts from Offline -> Slave; p3, p4, p5 on B
>>>>    starts from Offline -> Slave
>>>>
>>>> But the real result is:
>>>>
>>>>    1. p0, p1, p2 on A starts from Offline -> Slave, nothing happens on
>>>>    B
>>>>    2. until p0, p1, p2 all transited to Master state, p3, p4, p5 on A
>>>>    starts from Offline -> Slave; p0, p1, p2 on B starts from Offline ->
Slave
>>>>
>>>> As step Offline -> Slave might take long time, this behavior result in
>>>> very long time to bring up these two nodes (long down time result in long
>>>> catch up time as well), though ideally we should not let both nodes down
at
>>>> the same time.
>>>>
>>>> Looked at the controller code, the stage and pipeline based
>>>> implementation is well design, very easy to understand and to reason about.
>>>>
>>>> The logic of MessageThrottleStage#throttle,
>>>>
>>>>
>>>>    1. it goes through each messages selected by MessageSelectionStage,
>>>>    2. for each message, it goes through all selected matched
>>>>    constraints, and decrease the quota of each constraints
>>>>    1. if any constraint's quota is less than 0, this message will be
>>>>       marked as throttled.
>>>>
>>>> I think there is something wrong here, the message will take the quota
>>>> of constraints even it is not going to be sent out (throttled). That
>>>> explains our case,
>>>>
>>>>    - all the messages have been generated by the beginning, (p0, A,
>>>>    Offline->Slave), ... (p7, A, Offline->Slave), (p0, B, Offline->Slave),
...,
>>>>    (p7, B, Offline->Slave)
>>>>    - in the messageThrottleStage#throttle
>>>>       - (p0, A, Offline->Slave), (p1, A, Offline->Slave), (p2, A,
>>>>       Offline->Slave) are good, and constraint 1 on A reaches 0, constraint
2 on
>>>>       p0, p1, p2 reaches 0 as well
>>>>       - (p3, A, Offline->Slave), ... (p7, A, Offline->Slave) throttled
>>>>       by constraint 1 on A, also takes the quota of constraint 2 on those
>>>>       partitions as well.
>>>>       - (p0, B, Offline->Slave), ... (p7, B, Offline->Slave) throttled
>>>>       by constraint 2
>>>>       - thus only (p0, A, Offline->Slave), (p1, A, Oflline->Slave),
>>>>       (p2, A, Offline->Slave) has been sent out by controller.
>>>>
>>>> Does that make sense, or is there anything else you can think of to
>>>> result in this unexpected behavior? And is there any work around for it?
>>>> One thing comes into my mind is update constraint 2 to be only one state
>>>> transition is allowed of single partition on certain state transitions.
>>>>
>>>> Thanks very much.
>>>>
>>>> Thanks
>>>> Hang Qi
>>>>
>>>
>>>
>>
>>
>> --
>> Qi hang
>>
>


-- 
Qi hang

Mime
View raw message