reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Chung (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (REEF-976) Fix broken C# Tests caused by race condition of local RM
Date Thu, 31 Mar 2016 23:03:25 GMT

     [ https://issues.apache.org/jira/browse/REEF-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Chung updated REEF-976:
------------------------------
    Labels: FT  (was: )

> Fix broken C# Tests caused by race condition of local RM
> --------------------------------------------------------
>
>                 Key: REEF-976
>                 URL: https://issues.apache.org/jira/browse/REEF-976
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF-Tests, REEF.NET
>            Reporter: Andrew Chung
>            Assignee: Andrew Chung
>              Labels: FT
>
> There is a race condition in REEF-Local-Runtime, and it can happen as follows:
> # The Evaluator sends the {{DONE}} message and exits its process.
> # The RM discovers Evaluator ends, sends {{DONE}} message to Driver.
> # Driver first gets {{DONE}} message from RM before getting reading the {{DONE}} message
from the Evaluator in its network queue.
> # Driver calls {{FailedEvaluatorHandler}}, even though the Evaluator shuts down properly.
> This can be fixed by requiring an {{ACK}} from the Driver prior to letting the Evaluator
exit its process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message