Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7334A200B14 for ; Sat, 4 Jun 2016 05:44:01 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 71B4E160A50; Sat, 4 Jun 2016 03:44:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B889A160A49 for ; Sat, 4 Jun 2016 05:44:00 +0200 (CEST) Received: (qmail 54224 invoked by uid 500); 4 Jun 2016 03:43:59 -0000 Mailing-List: contact dev-help@reef.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@reef.apache.org Delivered-To: mailing list dev@reef.apache.org Received: (qmail 54201 invoked by uid 99); 4 Jun 2016 03:43:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Jun 2016 03:43:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 4ADA92C1F5C for ; Sat, 4 Jun 2016 03:43:59 +0000 (UTC) Date: Sat, 4 Jun 2016 03:43:59 +0000 (UTC) From: "Dhruv Mahajan (JIRA)" To: dev@reef.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (REEF-1407) Catching exceptions in group communication in failure case MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 04 Jun 2016 03:44:01 -0000 [ https://issues.apache.org/jira/browse/REEF-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315297#comment-15315297 ] Dhruv Mahajan commented on REEF-1407: ------------------------------------- So here are my thoughts and plan after looking at the code: The only place where streams are used in a different thread are in the Read loops of {{*TransportClient}} and {{*TransortServer}} where they are invoked in separate threads. Apart from that if error happens we should be able to catch it. Now wherever, these read loops are, there is always an associated IObserver to pass the incoming messages to upstream (network service and group communication.). My plan is to use {{OnError}} of these functions. Now again we have two options: a) throw the error right in that IObserver. This is simple and exception will be thrown right away even if Group comm. operators are not called currently. b) propagate error all the way up to the blocking queues via special Network Service and Group Comm. messages. In this case error will be thrown by the part or operator directly concerned with the problematic connection. Advantage here is that, if this connection was no longer needed, the exception will not be raised and process can continue. Moreover, this mechanism can also be used for closing the connections. [~markus.weimer] [~juliaw] [~afchung90] Plese comment. Otherwise on Monday I will go by b). > Catching exceptions in group communication in failure case > ---------------------------------------------------------- > > Key: REEF-1407 > URL: https://issues.apache.org/jira/browse/REEF-1407 > Project: REEF > Issue Type: Bug > Reporter: Julia > Assignee: Dhruv Mahajan > Labels: FT > > Currently when a task fails, other tasks in the group are stuck in reading data by a blocking call. We should be able to try and throw an exception and propagate the exception to Task so that the task can handle it in a proper way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)