[ https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573244#comment-13573244 ] Siddharth Seth commented on YARN-365: ------------------------------------- This isn't very different from configuring all nodes to have a higher heartbeat interval. With a high heartbeat interval, the NM would send a batch of updates over to the RM, and this heartbeat would trigger a scheduling pass. This change de-links RM scheduling passes from NM heartbeats. The NM can continue to provide node updates with a smaller interval, and the RM handles these, along with a scheduling pass, as and when it chooses to. In this particular case, the scheduler queue ends up with a single scheduling event per node - but will attempt a scheduling run only on the next heartbeat from that node. At a later point, the scheduling could be changed to be triggered by the arrival of a new application - or to just run in a tight loop. If the scheduler cannot keep up, it ends up scheduling as fast as it can - without node heartbeats affecting the queue size. Also, completed container information from heartbeats is processed earlier (instead of waiting for the event in the queue to be processed) - making each scheduler pass more efficient. bq. I can see cases where the all at once is actually worse as it will spend more time on a single heartbeat and potentially not get to other things in the queue like apps added as fast. The event should not be delayed more than the time required to complete one scheduling pass across all nodes. I don't think this will be much better in the case of a growing scheduler queue. bq. The only way I can see this being beneficial is if we can aggregate the heartbeats and have the scheduler process less. Do you mean somehow aggregating heartbeats across nodes ? This approach does aggregate heartbeats for a single node. > Each NM heartbeat should not generate and event for the Scheduler > ----------------------------------------------------------------- > > Key: YARN-365 > URL: https://issues.apache.org/jira/browse/YARN-365 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler > Affects Versions: 0.23.5 > Reporter: Siddharth Seth > Assignee: Xuan Gong > Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, YARN-365.2.patch, YARN-365.3.patch > > > Follow up from YARN-275 > https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira