stratos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajkumar Rajaratnam <rajkum...@wso2.com>
Subject Re: Member termination took 30 minutes
Date Thu, 12 Feb 2015 08:34:29 GMT
Hi,

I made *terminationPendingMemberExp**iry**Time* configurable via
autoscaler.xml, like other expiry timeouts.

Thanks.

On Thu, Feb 12, 2015 at 12:58 PM, Sajith Kariyawasam <sajith@wso2.com>
wrote:

> Thanks for the explanation Raj! Its more clear now
>
> On Thu, Feb 12, 2015 at 6:56 AM, Rajkumar Rajaratnam <rajkumarr@wso2.com>
> wrote:
>
>> Hi Sajith,
>>
>> Please find my comments inline.
>>
>> On Thu, Feb 12, 2015 at 1:06 AM, Sajith Kariyawasam <sajith@wso2.com>
>> wrote:
>>
>>> Hi Devs,
>>>
>>> While testing group scaling, I noticed when scaling down it takes 30
>>> minutes from the moment scaling rule decides to terminate an instance.
>>>
>>> An active member, which was selected by the rule, first moves to a
>>> "termination pending member map", and after a certain period
>>> (terminationPendingMemberExpiryTime) that member
>>> moves to an "obsolete member map". Then by the obsolete check rule, that
>>> member will be terminated via cloud controller.
>>>
>>> It seems because of the property  terminationPendingMemberExpiryTime,
>>> default value of which is 30 minutes, this takes that amount of time to get
>>> terminated
>>>
>>> Sorry for asking, I might have missed some past discussions regarding
>>> this, could someone explain the purpose of moving the member to an
>>> intermediary map "termination pending member map", rather than moving
>>> directly to "obsolete member map"?
>>>
>>
>> The reason is to avoid event lost and graceful termination. Let me
>> explain the logic.
>>
>>    - When scaling down, AS will move the member from "active member
>>    list" to "termination pending member map".
>>    - There is a drool-rule "Cleanup Instances which are pending
>>    termination" which will run periodically and take all the members which are
>>    in "termination pending member map" and publish instance clean up event.
>>    - When CA receives instance clean up event, it will publish instance
>>    ready to shutdown.
>>    - When CC receives instance ready to shutdown event, it will publish
>>    member ready to shutdown.
>>    - When AS receives member ready to shutdown event, it will move the
>>    member from  "termination pending member map" to "obsolete member map".
>>    - Hence, until AS receives member ready to shutdown event, it will
>>    keep publishing instance clean up event in every cluster monitor interval
>>    (drool is running)
>>    - If AS is not receiving member ready to shutdown event for a member
>>    in "termination pending member map" within 30 min (upper limit), this
>>    member will be moved to obsolete list without waiting for the member ready
>>    to shutdown event.
>>
>> The reason for this complete cycle is graceful termination. If we put the
>> member into "obsolete member map", it will not be terminated gracefully.
>>
>> The reason why we are moving the member from "active member list" to
>> "termination pending member map" is to avoid event lost. We have had
>> situations where some event is lost in the above cycle. These events are
>> published only once. If we lost one event in this cycle, that member will
>> not be terminated forever. That is why we are putting the member in the
>> map. In every cluster monitor interval, we are taking all the members in
>> the "termination pending member map" and send the instance clean up event.
>> This will overcome event lost.
>>
>> 30 min is the upper limit, maximum time a member can resides in
>> "termination pending member map". You have faced the edge scenario, where
>> AS didn't receive the member ready to shutdown event. So AS took 30 min to
>> move the member to obsolete list.
>>
>>>
>>> Also, is terminationPendingMemberExpiryTime parameter configurable?
>>> (seems not) , and any reason for it to set to 30 minutes?
>>>
>>
>> This is not configurable yet. But other member list/map expiry times are
>> configurable AFAIR.
>>
>>>
>>> Further, we should make sleep times of  PendingMemberWatcher,
>>> ObsoletedMemberWatcher and TerminationPendingMemberWatcher configurable.
>>> WDYT?
>>>
>>
>> Yes we have to.
>>
>>
>>>
>>> We need to document those configurable parameters as well, @Mari please
>>> note.
>>>
>>>
>>> Thanks,
>>> Sajith
>>>
>>>
>>>
>>
>>
>> --
>> Rajkumar Rajaratnam
>> Committer & PMC Member, Apache Stratos
>> Software Engineer, WSO2
>>
>> Mobile : +94777568639
>> Blog : rajkumarr.com
>>
>
>


-- 
Rajkumar Rajaratnam
Committer & PMC Member, Apache Stratos
Software Engineer, WSO2

Mobile : +94777568639
Blog : rajkumarr.com

Mime
View raw message