reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruv Mahajan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-1244) Group Communication does not close down properly at the end if reej job
Date Thu, 10 Mar 2016 02:19:40 GMT

    [ https://issues.apache.org/jira/browse/REEF-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188518#comment-15188518
] 

Dhruv Mahajan commented on REEF-1244:
-------------------------------------

[~markus.weimer] I have a question for this. How can we differentiate between the streams
actually failing vs. they are failing because we are closing them at one end. Because we will
be sort of blocked on Read or ReadAsync() on one end. 
 Do we want to differentiate at all?

> Group Communication does not close down properly at the end if reej job
> -----------------------------------------------------------------------
>
>                 Key: REEF-1244
>                 URL: https://issues.apache.org/jira/browse/REEF-1244
>             Project: REEF
>          Issue Type: Bug
>          Components: GroupCommunications
>    Affects Versions: 0.13
>         Environment: C#
>            Reporter: Dhruv Mahajan
>            Assignee: Dhruv Mahajan
>             Fix For: 0.13
>
>
> Currently, when we want to shut down evaluator, the dispose function of group communications
will be called. However, there is a race condition that occurs. For example, suppose evaluator
e1 calls dispose and closes the stream with evaluator e2. Then if e2 is in ReadAsync() function
of the stream, we will get a failure since Dispose() function in e2 is still not called. Moreover,
the Dispose() function in e2 will try to close the already closed stream again. 
> Some of these scenarios are handled by catching Exceptions and ignoring them but some
are not captured and throw errors which leads to driver and reef job failing.
> The aim of this JIRA is to identify all these closing scenarios and handle them appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message