Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 72B74200AC8 for ; Tue, 24 May 2016 02:56:15 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 71490160A0E; Tue, 24 May 2016 00:56:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BE048160A2B for ; Tue, 24 May 2016 02:56:14 +0200 (CEST) Received: (qmail 29150 invoked by uid 500); 24 May 2016 00:56:13 -0000 Mailing-List: contact dev-help@reef.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@reef.apache.org Delivered-To: mailing list dev@reef.apache.org Received: (qmail 28852 invoked by uid 99); 24 May 2016 00:56:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 May 2016 00:56:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 1CB352C1F62 for ; Tue, 24 May 2016 00:56:13 +0000 (UTC) Date: Tue, 24 May 2016 00:56:13 +0000 (UTC) From: "Julia (JIRA)" To: dev@reef.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (REEF-1345) Throw proper exceptions in IMRU Task MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 24 May 2016 00:56:15 -0000 [ https://issues.apache.org/jira/browse/REEF-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297458#comment-15297458 ] Julia commented on REEF-1345: ----------------------------- We have a blocking call toread data in group communication. The issue was discussed in REEF-1392. We need to make a call for if we would like to throw exception in this case. There are two options: 1. Leave it as it is today. In this case, the task would hung if one of the task in the communication group fails. Then we must depend on close event handler to enforce to close the task in fault tolerant case so that we can resubmit the task again on the same Evaluator. 2. Pass timeout when trying to Take data from the message queue in calling its API. We can make this time out pretty long but at least it won't hung forever. If we can not get data eventually, we will throw TaskGroupCommunicaiton exception. I prefer the second one at least it would avoid resource leak if Evaluator is not killed. [~markus.weimer][~dkm2110], let me know what do you think. > Throw proper exceptions in IMRU Task > ------------------------------------ > > Key: REEF-1345 > URL: https://issues.apache.org/jira/browse/REEF-1345 > Project: REEF > Issue Type: Task > Reporter: Julia > Labels: FT > > For IMRU fault tolerant, we need to identify where to throw proper exceptions with error messages in places where exception may happen. It includes: > TaskFailByCommunication - if there is any error caused by group communication, typical case is when a task is not able to get messages from its children, this exception should be thrown . > TaskFiledByAppError - catch possible application error and throw the corresponding excretions in those cases. > TaskFailedBySystem - any possible system error that could crash the task such as memory, hard disk, file access, network, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)