reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julia (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (REEF-1549) Resolve the issue in WaitingForRegistration
Date Thu, 08 Sep 2016 22:07:20 GMT

    [ https://issues.apache.org/jira/browse/REEF-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475165#comment-15475165
] 

Julia edited comment on REEF-1549 at 9/8/16 10:06 PM:
------------------------------------------------------

Adding a context layer would introduce system state change therefore more changes in the fault
tolerant code. 
To resolve WaitingForRegistration issue, I would like to propose this approach:
* Move WaitingForRegistration to Call() method as the first line
* Passing cancellation token to WaitingForRegistration 
* When driver is in shut down state, it will send close even to all the running tasks as before
* When task receives cancellation token during WaitingForRegistration, it will return right
way from the next retry loop.

This way, 
* Driver would be able to get IRunningTask quickly so that it can use this reference to sending
event to task
* We won't mixture communication error with Injection Exception. 
* The behavior is controlled by cancellation token, it is on time with no delay.





was (Author: juliaw):
Adding a context layer would introduce system state change therefore more changes in the fault
tolerant code. 
To resolve WaitingForRegistration issue, I would like to propose this approach:
* move WaitingForRegistration in Call() method as the first line
* Passing cancellation token to WaitingForRegistration 
* When driver is in shut down state, it will send close even to all the running tasks as before
* When task receives cancellation token during WaitingForRegistration, it will return right
way from the next retry loop.

This way, 
* Driver would be able to get IRunningTask quickly so that it can use this reference to sending
event to task
* We won't mixture communication error with Injection Exception. 
* The behavior is controlled by cancellation token, it is on time with no delay.




> Resolve the issue in WaitingForRegistration
> -------------------------------------------
>
>                 Key: REEF-1549
>                 URL: https://issues.apache.org/jira/browse/REEF-1549
>             Project: REEF
>          Issue Type: Improvement
>    Affects Versions: 0.16
>            Reporter: Julia
>              Labels: FT
>
> Currently, if an elevator fails while we are still in the phase of task submission, we
will have an issue where the newly created tasks will wait in WaitForRegistration in Group
communication initialization until timeout. 
> A way to do it is to cancel the task that is in constructing. The issue is the driver
has not received IRunningTask yet at this time therefore there is no way to send event to
the task with the current system.
> Another way is to add a context layer for group communication initialization. Let Driver/GroupCommuDriver
to control if all such contexts are created based on the context event. Then  submitting tasks
on those contexts. This would keep the control for group communications in a centralized place.
It would also makes task initialization much quicker and reduce the chance to get failures
in task constructor before task is running. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message