helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: Message throttling of controller behavior unexpectedly when there are multiple constraints
Date Mon, 18 May 2015 04:51:13 GMT
Yes, we can definitely help in reviewing the patch.

thanks,
Kishore G

On Sun, May 17, 2015 at 9:46 PM, Hang Qi <hangq.1985@gmail.com> wrote:

> Hi Kishore,
>
> Thanks, should I go ahead to create a JIRA issue, and to add a test case
> and propose a patch for the fix?
>
> Thanks
> Hang Qi
>
> On Sun, May 17, 2015 at 6:19 AM, kishore g <g.kishore@gmail.com> wrote:
>
>> Got it, that should be fixed. Would be great to get a patch to fix it.
>> Good find.
>>
>> Thanks
>> Kishore G
>> On May 16, 2015 11:50 PM, "Hang Qi" <hangq.1985@gmail.com> wrote:
>>
>>> Hi Kishore,
>>>
>>> Thanks for your reply.
>>>
>>> I am not saying I want Offline->Slave higher priority than
>>> Slave->Master. I agree with you, one master is more important than two
>>> slaves, and that one only applies to one partition. What I am saying is
>>> during p0, p1, p2 Offline->Slave transition on node A, I also want p3, p4,
>>> p5 performing Offline->Slave transition on node B at the same time, but not
>>> wait until p0, p1, p2 becomes Master on node A, there begins to have
>>> partition transition on node B, that's kind of waste here.
>>>
>>> The reason to have one transition per partition at a time is summarized
>>> in following thread.
>>>
>>> http://mail-archives.apache.org/mod_mbox/helix-user/201503.mbox/%3CCAJ2%3DoXxBWF1VoCm%3DjjyhuFCWHuxw3wYPotGz8VRkEnzVhrmgwQ%40mail.gmail.com%3E
>>>
>>> Thanks
>>> Hang Qi
>>>
>>> On Sat, May 16, 2015 at 8:23 PM, kishore g <g.kishore@gmail.com> wrote:
>>>
>>>> Thanks Hang for the detailed explanation.
>>>>
>>>> Before the MessageSelectionStage, there is a stage that orders the
>>>> messages according to the state transition priority list. I think
>>>> Slave-Master is always higher priority than offline-slave which makes sense
>>>> because in general having a master is probably more important than two
>>>> slaves.
>>>>
>>>> Can you provide the state transition priority list in your state model
>>>> definition. If you think that its important to get node B to Slave state
>>>> before promoting node A from Slave to Master, you can change the priority
>>>> order. Note: this can be changed dynamically and does not require re
>>>> starting the servers.
>>>>
>>>> Another question is what is the reason to have constraint #2 i.e only
>>>> one transition per partition at a time.
>>>>
>>>> thanks,
>>>> Kishore G
>>>>
>>>>
>>>>
>>>> On Sat, May 16, 2015 at 4:48 PM, Hang Qi <hangq.1985@gmail.com> wrote:
>>>>
>>>>> Hi folks,
>>>>>
>>>>> We found a very strange behavior on message throttling of controller
>>>>> when there is multiple constraints. Here is our setup ( we are using
>>>>> helix-0.6.4, only one resource )
>>>>>
>>>>>    - constraint 1: per node constraint, we only allow 3 state
>>>>>    transitions happens on one node concurrently.
>>>>>    - constraint 2: per partition constraint, we define the state
>>>>>    transition priorities in the state model, and only allow one state
>>>>>    transition happens on one single partition concurrently.
>>>>>
>>>>> We are using MasterSlave state model, suppose we have two nodes A, B,
>>>>> each has 8 partitions (p0-p7) respectively, and initially both A and
B are
>>>>> shutdown, and now we start them at the same time (say A is slightly earlier
>>>>> than B).
>>>>>
>>>>> The expected behavior might be
>>>>>
>>>>>    1. p0, p1, p2 on A starts from Offline -> Slave; p3, p4, p5 on
B
>>>>>    starts from Offline -> Slave
>>>>>
>>>>> But the real result is:
>>>>>
>>>>>    1. p0, p1, p2 on A starts from Offline -> Slave, nothing happens
>>>>>    on B
>>>>>    2. until p0, p1, p2 all transited to Master state, p3, p4, p5 on A
>>>>>    starts from Offline -> Slave; p0, p1, p2 on B starts from Offline
-> Slave
>>>>>
>>>>> As step Offline -> Slave might take long time, this behavior result
in
>>>>> very long time to bring up these two nodes (long down time result in
long
>>>>> catch up time as well), though ideally we should not let both nodes down
at
>>>>> the same time.
>>>>>
>>>>> Looked at the controller code, the stage and pipeline based
>>>>> implementation is well design, very easy to understand and to reason
about.
>>>>>
>>>>> The logic of MessageThrottleStage#throttle,
>>>>>
>>>>>
>>>>>    1. it goes through each messages selected by
>>>>>    MessageSelectionStage,
>>>>>    2. for each message, it goes through all selected matched
>>>>>    constraints, and decrease the quota of each constraints
>>>>>    1. if any constraint's quota is less than 0, this message will be
>>>>>       marked as throttled.
>>>>>
>>>>> I think there is something wrong here, the message will take the quota
>>>>> of constraints even it is not going to be sent out (throttled). That
>>>>> explains our case,
>>>>>
>>>>>    - all the messages have been generated by the beginning, (p0, A,
>>>>>    Offline->Slave), ... (p7, A, Offline->Slave), (p0, B, Offline->Slave),
...,
>>>>>    (p7, B, Offline->Slave)
>>>>>    - in the messageThrottleStage#throttle
>>>>>       - (p0, A, Offline->Slave), (p1, A, Offline->Slave), (p2,
A,
>>>>>       Offline->Slave) are good, and constraint 1 on A reaches 0, constraint
2 on
>>>>>       p0, p1, p2 reaches 0 as well
>>>>>       - (p3, A, Offline->Slave), ... (p7, A, Offline->Slave)
>>>>>       throttled by constraint 1 on A, also takes the quota of constraint
2 on
>>>>>       those partitions as well.
>>>>>       - (p0, B, Offline->Slave), ... (p7, B, Offline->Slave)
>>>>>       throttled by constraint 2
>>>>>       - thus only (p0, A, Offline->Slave), (p1, A, Oflline->Slave),
>>>>>       (p2, A, Offline->Slave) has been sent out by controller.
>>>>>
>>>>> Does that make sense, or is there anything else you can think of to
>>>>> result in this unexpected behavior? And is there any work around for
it?
>>>>> One thing comes into my mind is update constraint 2 to be only one state
>>>>> transition is allowed of single partition on certain state transitions.
>>>>>
>>>>> Thanks very much.
>>>>>
>>>>> Thanks
>>>>> Hang Qi
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Qi hang
>>>
>>
>
>
> --
> Qi hang
>

Mime
View raw message