reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julia (JIRA)" <>
Subject [jira] [Comment Edited] (REEF-976) Fix broken C# Tests caused by race condition of local RM
Date Thu, 31 Mar 2016 23:56:25 GMT


Julia edited comment on REEF-976 at 3/31/16 11:56 PM:

REEF-1306 shows the error logs in BroadcastReduce test. Sometimes, after a task is done, it
still throws failed evaluator exception if some network object is not disposed properly. 

was (Author: juliaw):
Those two might be related.

> Fix broken C# Tests caused by race condition of local RM
> --------------------------------------------------------
>                 Key: REEF-976
>                 URL:
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF-Tests, REEF.NET
>            Reporter: Andrew Chung
>            Assignee: Andrew Chung
>              Labels: FT
> There is a race condition in REEF-Local-Runtime, and it can happen as follows:
> # The Evaluator sends the {{DONE}} message and exits its process.
> # The RM discovers Evaluator ends, sends {{DONE}} message to Driver.
> # Driver first gets {{DONE}} message from RM before getting reading the {{DONE}} message
from the Evaluator in its network queue.
> # Driver calls {{FailedEvaluatorHandler}}, even though the Evaluator shuts down properly.
> This can be fixed by requiring an {{ACK}} from the Driver prior to letting the Evaluator
exit its process.

This message was sent by Atlassian JIRA

View raw message