reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shravan Matthur Narayanamurthy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-1480) Increase the retry count for task registration to high value
Date Tue, 19 Jul 2016 18:41:21 GMT

    [ https://issues.apache.org/jira/browse/REEF-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384653#comment-15384653
] 

Shravan Matthur Narayanamurthy commented on REEF-1480:
------------------------------------------------------

Regarding the data download being executed in parallel: This should really be done. In the
{{ContextStartHandler}} I see that the IInputPartition is simply being instantiated and stored
(technically not required at least for the use case being targeted - point being Tang has
instantiated the singleton and the context handler need not have an extra reference). This
would be the perfect place for doing the data download and parsing(if the data needs to loaded
into memory) and in a separate thread as Markus pointed out so as to not block the {{ContextStartHandler}}
constructor from returning. 

For the group-communication set-up and registration, why do we have a wait with a time out?
We can actually do this also through a context right? When a task gets queued in the task
starter, we create another context that gets passed the task id. The start handler should
do the set-up and register the id with the NameService. When all the contexts are active,
we know that the set-up is done. We don't need to do a waitForTaskRegistration in the Tasks.

> Increase the retry count for task registration to high value
> ------------------------------------------------------------
>
>                 Key: REEF-1480
>                 URL: https://issues.apache.org/jira/browse/REEF-1480
>             Project: REEF
>          Issue Type: Improvement
>          Components: REEF.NET
>         Environment: C#
>            Reporter: Dhruv Mahajan
>
> Currently, the default retry count in Group communication to wait for registration is
set so that error is thrown after around 4 minutes. For IMRU tasks, if data downloading takes
a lot of time error gets thrown. In general this can be the issue for any other application
also since it is too lower level parameter to expose via application interfaces, for example
{{IMRUJobDefinition}}. Like hadoop MapReduce, we can take a configuration file and then read
these parameters from over there. For now, we would like to set the default to a very high
value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message