reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-1345) Throw proper exceptions in IMRU Task
Date Tue, 24 May 2016 17:41:13 GMT

    [ https://issues.apache.org/jira/browse/REEF-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298584#comment-15298584
] 

Julia commented on REEF-1345:
-----------------------------

The original purpose of the Jira is to define IMRU task exceptions and also update IMRU task
code to catch and throw the proper exceptions. We can make them separate, let this Jira to
define exceptions only. 

For the blocking call in GC layer, discussed with Andrew this morning, we know exactly where
is the call in GC code. We can put the call (indirectly) in the while (!break) loop in task,
and passing a time out to the blocking call when trying to take a data. After timeout if we
don't get data, GC will throw exception and back to the loop. If not !break, we will retry.
In this way, we won't loose anything but make the code more robust. And it would allow us
to get proper group communication exception.  

Now we are not able to catch exceptions from WAKE layer as it is in separate thread. This
is a separate work item.  



> Throw proper exceptions in IMRU Task
> ------------------------------------
>
>                 Key: REEF-1345
>                 URL: https://issues.apache.org/jira/browse/REEF-1345
>             Project: REEF
>          Issue Type: Task
>            Reporter: Julia
>              Labels: FT
>
> For IMRU fault tolerant, we need to identify where to throw proper exceptions with error
messages in places where exception may happen. It includes: 
> TaskFailByCommunication - if there is any error caused by group communication, typical
case is when a task is not able to get messages from its children, this exception should be
thrown .
> TaskFiledByAppError - catch possible application error and throw the corresponding excretions
in those cases. 
> TaskFailedBySystem - any possible system error that could crash the task such as memory,
hard disk, file access, network, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message