reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruv Mahajan (JIRA)" <>
Subject [jira] [Commented] (REEF-1480) Increase the retry count for task registration to high value
Date Tue, 05 Jul 2016 17:19:11 GMT


Dhruv Mahajan commented on REEF-1480:

If we expect user to always do that then may be ... Also data download is one aspect ... As
we increase the number of nodes irrespective of data download this will always happen. For
example this problem starts arising from around 500-1000 nodes since for such high number
set up takes time. As I said earlier, we can make it configurable by adding another field
in abstraction like IMRU but to me this is too low level to be included there. So I am willing
to go with one of two solutions: a) the one currently proposed in PR, or b) Add field in IMRUJobDefinition
regarding this. [~markus.weimer] if you think b) is better than a) we will go with that?

> Increase the retry count for task registration to high value
> ------------------------------------------------------------
>                 Key: REEF-1480
>                 URL:
>             Project: REEF
>          Issue Type: Improvement
>          Components: REEF.NET
>         Environment: C#
>            Reporter: Dhruv Mahajan
> Currently, the default retry count in Group communication to wait for registration is
set so that error is thrown after around 4 minutes. For IMRU tasks, if data downloading takes
a lot of time error gets thrown. In general this can be the issue for any other application
also since it is too lower level parameter to expose via application interfaces, for example
{{IMRUJobDefinition}}. Like hadoop MapReduce, we can take a configuration file and then read
these parameters from over there. For now, we would like to set the default to a very high

This message was sent by Atlassian JIRA

View raw message