reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruv Mahajan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-1407) Catching exceptions in group communication in failure case
Date Sat, 04 Jun 2016 03:43:59 GMT

    [ https://issues.apache.org/jira/browse/REEF-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315297#comment-15315297
] 

Dhruv Mahajan commented on REEF-1407:
-------------------------------------

So here are my thoughts and plan after looking at the code:

The only place where streams are used in a different thread are in the Read loops of {{*TransportClient}}
and {{*TransortServer}} where they are invoked in separate threads. Apart from that if error
happens we should be able to catch it.

Now wherever, these read loops are, there is always an associated IObserver to pass the incoming
messages to upstream (network service and group communication.). My plan is to use {{OnError}}
of these functions. Now again we have two options:

a) throw the error right in that IObserver. This is simple and exception will be thrown right
away even if Group comm. operators are not called currently.

b) propagate error all the way up to the blocking queues via special Network Service and Group
Comm. messages. In this case error will be thrown by the part or operator directly concerned
with the problematic connection. Advantage here is that, if this connection was no longer
needed, the exception will not be raised and process can continue. Moreover, this mechanism
can also be used for closing the connections.

[~markus.weimer] [~juliaw] [~afchung90] Plese comment. Otherwise on Monday I will go by b).


> Catching exceptions in group communication in failure case
> ----------------------------------------------------------
>
>                 Key: REEF-1407
>                 URL: https://issues.apache.org/jira/browse/REEF-1407
>             Project: REEF
>          Issue Type: Bug
>            Reporter: Julia
>            Assignee: Dhruv Mahajan
>              Labels: FT
>
> Currently when a task fails, other tasks in the group are stuck in reading data by a
blocking call. We should be able to try and throw an exception and propagate the exception
to Task so that the task can handle it in a proper way. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message