reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruv Mahajan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (REEF-1480) Increase the retry count for task registration to high value
Date Tue, 05 Jul 2016 17:57:11 GMT

    [ https://issues.apache.org/jira/browse/REEF-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362874#comment-15362874
] 

Dhruv Mahajan edited comment on REEF-1480 at 7/5/16 5:57 PM:
-------------------------------------------------------------

{quote}
What do you mean?
{quote}

I mean if we expect user to always make sure in his application that download is executed
in parallel. For application like IMRU a user has to make sure that the {{IInputPartition}}
is written in such a way.

{quote}
I don't think a configuration option is the right solution here. 
{quote}

Agreed on this.

{quote}
What time scales with the number of containers? Is it a matter of wiring up the topology?
If so, couldn't this be done separately from the Task registration with the name server?
{quote}

With increase in number of evaluators, difference between when master evaluator is ready vs.
others increase linearly. I think it is mainly the topology wire up cost although I am not
100% sure. Currently, task registration is the way we make sure everything is set up wrt.
Group communication. I mean it is assumed that before Task registration all the wire up has
happened and we should go ahead. Otherwise, we would need to come up with some other mechanism
for this synchronization step.


was (Author: dkm2110):
{quote}
What do you mean?
{quote}

I mean if we expect user to always make sure in his application that download is executed
in parallel. For application like IMRU a user has to make sure that the {{IInputPartition}}
is written in such a way.

{{quote}}
I don't think a configuration option is the right solution here. 
{{quote}}

Agreed on this.

{{quote}}
What time scales with the number of containers? Is it a matter of wiring up the topology?
If so, couldn't this be done separately from the Task registration with the name server?
{{quote}}

With increase in number of evaluators, difference between when master evaluator is ready vs.
others increase linearly. I think it is mainly the topology wire up cost although I am not
100% sure. Currently, task registration is the way we make sure everything is set up wrt.
Group communication. I mean it is assumed that before Task registration all the wire up has
happened and we should go ahead. Otherwise, we would need to come up with some other mechanism
for this synchronization step.

> Increase the retry count for task registration to high value
> ------------------------------------------------------------
>
>                 Key: REEF-1480
>                 URL: https://issues.apache.org/jira/browse/REEF-1480
>             Project: REEF
>          Issue Type: Improvement
>          Components: REEF.NET
>         Environment: C#
>            Reporter: Dhruv Mahajan
>
> Currently, the default retry count in Group communication to wait for registration is
set so that error is thrown after around 4 minutes. For IMRU tasks, if data downloading takes
a lot of time error gets thrown. In general this can be the issue for any other application
also since it is too lower level parameter to expose via application interfaces, for example
{{IMRUJobDefinition}}. Like hadoop MapReduce, we can take a configuration file and then read
these parameters from over there. For now, we would like to set the default to a very high
value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message