hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-275) Make NodeManagers to NOT blindly heartbeat irrespective of whether previous heartbeat is processed or not.
Date Fri, 28 Dec 2012 22:48:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540621#comment-13540621
] 

Bikas Saha commented on YARN-275:
---------------------------------

I briefly looked at the patch. The general approach seems promising. I have some comments
on how we can structure this changes
We could break this work into 2 parts
1) protocol changes in heartbeat to transfer heartbeat control frequency from NM to RM. After
this, in every heartbeat the RM will tell the NM when to send the next heartbeat. That value
can be hardcoded (like it is currently) but preferably we can have an RM config that defines
what the minimum heartbeat interval should be and use that. For this part, I dont think we
need both backoff and heartbeatinterval in the heartbeat response. We can just have only heartbeatinterval
that is always respected by the NM.
2) add some logic/heuristic to the RM so that it can dynamically change the heartbeat interval
based on its current processing load/rate. This way the interval can be made longer when the
RM is not keeping up with heartbeats.
If you think this break-up of works makes sense then we can create 2 sub-tasks under this
jira for the 2 parts.

I have some additional ideas on part 1 also.
When a heartbeat comes at time T to the RM then it can choose to 
A) accept the request at time T and ask NM to heartbeat after time T+K with new information.
This adds more load to the current RM load. This is what the current code does. So no change
is required to do this.
B) reject the request at time T and ask NM to heartbeat after time T+K with current+new information.
This does not increase load on RM but makes NM more complex because it needs to hold onto
the last heartbeat data and merge in new data to it.
What do you think about these alternatives?
                
> Make NodeManagers to NOT blindly heartbeat irrespective of whether previous heartbeat
is processed or not.
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-275
>                 URL: https://issues.apache.org/jira/browse/YARN-275
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Xuan Gong
>         Attachments: YARN-270.1.patch
>
>
> We need NMs to back off. The event handler mechanism is very scalable but not infinitely
so :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message