reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruv Mahajan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-1480) Increase the retry count for task registration to high value
Date Tue, 05 Jul 2016 19:34:11 GMT

    [ https://issues.apache.org/jira/browse/REEF-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15363086#comment-15363086
] 

Dhruv Mahajan commented on REEF-1480:
-------------------------------------

You mean a universal timeout rather than per evaluator one? I need to think about it. Wouldn't
the logic be controlled by driver then since an evaluator just knows its local topology?

> Increase the retry count for task registration to high value
> ------------------------------------------------------------
>
>                 Key: REEF-1480
>                 URL: https://issues.apache.org/jira/browse/REEF-1480
>             Project: REEF
>          Issue Type: Improvement
>          Components: REEF.NET
>         Environment: C#
>            Reporter: Dhruv Mahajan
>
> Currently, the default retry count in Group communication to wait for registration is
set so that error is thrown after around 4 minutes. For IMRU tasks, if data downloading takes
a lot of time error gets thrown. In general this can be the issue for any other application
also since it is too lower level parameter to expose via application interfaces, for example
{{IMRUJobDefinition}}. Like hadoop MapReduce, we can take a configuration file and then read
these parameters from over there. For now, we would like to set the default to a very high
value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message