hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-1449) Protocol changes and implementations in NM side to support change container resource
Date Wed, 11 Dec 2013 09:57:09 GMT

     [ https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Wangda Tan updated YARN-1449:

    Attachment: yarn-1449.4.patch

[~sandyr], I've updated my code according to your comments (thanks again!). But there're some
places I haven't changed.

+    // first add all decreased containers to failed, we will support it in the
+    // future
What is it that we will support in the future?

Originally, I planned not support container decrease, so I put all decrease requests directly
to failed list. This patch included decrease requests support and such decreased containers
will be added to next node heartbeat sent to RM.

+public class ContainerChangeMonitoringEvent extends ContainersMonitorEvent {
+  private final long vmemLimit;
+  private final long pmemLimit;
No need to calculate / include the vmem here. The containers monitor should know about it.

I think we needed them. Originally this is created by ContainerImpl.LaunchTransition, and
ContainersMonitoring will not care about this, this is passed in to CsM by ContainerStartMonitoringEvent.
I just follow the same way of what ContainerStartMonitoringEvent does.

Unnecessary change
-  private static class ProcessTreeInfo {
+  static class ProcessTreeInfo {

This is used for testing, the easiest way to test ContainersMonitoring is change pmem/vmem
by setting fields in ResourceCalculatorProcessTree. We can get ProcessTreeInfo in trackingContainers
of ContainersMonitor, but we cannot use this class because its private. I don't if I should
propose another jira for this, but I think this can make our easier to write tests for ContainersMonitoring.

+  public void testResourceChange() throws Exception {
It looks like this test relies on sleeps to wait for events to be handled. This both makes
the test run longer and can cause flaky failures. I think there is a way to actually wait
for the events to be completed. I don't know it off the top of my head but can look it up
for you if that would be helpful.

I've thinked about this but not come up with a better way to do it. This will be failed only
in some extremely bad cases (like very low resource available). I've set YarnConfiguration.NM_CONTAINER_MON_INTERVAL_MS
to 20ms in test, and 200ms seems enough for ContainersMonitor react to bad containers. Please
let me know if you have any other ideas!

And any thoughts about this patch?

> Protocol changes and implementations in NM side to support change container resource
> ------------------------------------------------------------------------------------
>                 Key: YARN-1449
>                 URL: https://issues.apache.org/jira/browse/YARN-1449
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.2.0
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch
> As described in YARN-1197, we need add API/implementation changes,
> 1) Add a "changeContainersResources" method in ContainerManagementProtocol
> 2) Can get succeed/failed increased/decreased containers in response of "changeContainersResources"
> 3) Add a "new decreased containers" field in NodeStatus which can help NM notify RM such
> 4) Added changeContainersResources implementation in ContainerManagerImpl
> 5) Added changes in ContainersMonitorImpl to support change resource limit of containers

This message was sent by Atlassian JIRA

View raw message