reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mariia Mykhailova (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-1388) Fix RunningTask to be sent for short-lived .NET tasks
Date Wed, 11 May 2016 20:19:12 GMT

    [ https://issues.apache.org/jira/browse/REEF-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280746#comment-15280746
] 

Mariia Mykhailova commented on REEF-1388:
-----------------------------------------

We have several options on how to fix this:
* (easy) Modify {{TaskStatus.SetRunning}} to send RUNNING message immediately after switching
task to running state. This is exactly what would happen if periodic heartbeat managed to
catch this moment, so behavior change is minimal.
* (moderate) Modify Java code to trigger RunningTask event based on INIT message instead of
first RUNNING message. This is the behavior implied by comments in {{TaskStatus.SetRunning}}.
* (hard, a lot of behavior change) Change task behavior to send message to driver once it's
running. Since we consider tasks to be completely user code, I don't think it's a good idea.

I'd go with first option, sending RUNNING message immediately after switching task to running
state. Any objections?

> Fix RunningTask to be sent for short-lived .NET tasks
> -----------------------------------------------------
>
>                 Key: REEF-1388
>                 URL: https://issues.apache.org/jira/browse/REEF-1388
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF.NET
>            Reporter: Mariia Mykhailova
>            Assignee: Mariia Mykhailova
>              Labels: FT
>
> Currently our task start handling code works as follows:
> 1. Send INIT message to driver.
> 2. Start task.
> 3. Send status updates as periodic heartbeat with 4 seconds period; first RUNNING status
received by java code triggers RunningTask event.
> If the task completes fast enough, periodic heartbeat might not catch task in process
of execution, and thus driver will never receive RunningTask event. All our tests which rely
on RunningTask have tasks which either sleep for 5+ seconds or wait until a RunningTask handler
sends a message to the task, so they never uncover this issue. This seems to be a bad design.
We need to fix this (and probably also reduce amount of sleep in some tests in spirit of REEF-1203).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message