Mailing-List: contact dev-help@reef.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@reef.apache.org
Date: Sat, 4 Jun 2016 03:43:59 +0000 (UTC)
From: "Dhruv Mahajan (JIRA)" <jira@apache.org>
To: dev@reef.apache.org
Message-ID: <JIRA.12972919.1464223790000.32081.1465011839303@Atlassian.JIRA>
In-Reply-To: <JIRA.12972919.1464223790000@Atlassian.JIRA>
References: <JIRA.12972919.1464223790000@Atlassian.JIRA> <JIRA.12972919.1464223790463@arcas>
Subject: [jira] [Commented] (REEF-1407) Catching exceptions in group
 communication in failure case
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Sat, 04 Jun 2016 03:44:01 -0000


    [ https://issues.apache.org/jira/browse/REEF-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315297#comment-15315297 ] 

Dhruv Mahajan commented on REEF-1407:
-------------------------------------

So here are my thoughts and plan after looking at the code:

The only place where streams are used in a different thread are in the Read loops of {{*TransportClient}} and {{*TransortServer}} where they are invoked in separate threads. Apart from that if error happens we should be able to catch it.

Now wherever, these read loops are, there is always an associated IObserver to pass the incoming messages to upstream (network service and group communication.). My plan is to use {{OnError}} of these functions. Now again we have two options:

a) throw the error right in that IObserver. This is simple and exception will be thrown right away even if Group comm. operators are not called currently.

b) propagate error all the way up to the blocking queues via special Network Service and Group Comm. messages. In this case error will be thrown by the part or operator directly concerned with the problematic connection. Advantage here is that, if this connection was no longer needed, the exception will not be raised and process can continue. Moreover, this mechanism can also be used for closing the connections.

[~markus.weimer] [~juliaw] [~afchung90] Plese comment. Otherwise on Monday I will go by b).


> Catching exceptions in group communication in failure case
> ----------------------------------------------------------
>
>                 Key: REEF-1407
>                 URL: https://issues.apache.org/jira/browse/REEF-1407
>             Project: REEF
>          Issue Type: Bug
>            Reporter: Julia
>            Assignee: Dhruv Mahajan
>              Labels: FT
>
> Currently when a task fails, other tasks in the group are stuck in reading data by a blocking call. We should be able to try and throw an exception and propagate the exception to Task so that the task can handle it in a proper way. 


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)