Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Wed, 11 Dec 2013 09:57:09 +0000 (UTC)
From: "Wangda Tan (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12681191.1385434272574.16411.1386755829600@arcas>
In-Reply-To: <JIRA.12681191.1385434272574@arcas>
References: <JIRA.12681191.1385434272574@arcas>
Subject: [jira] [Updated] (YARN-1449) Protocol changes and implementations
 in NM side to support change container resource
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wangda Tan updated YARN-1449:
-----------------------------

    Attachment: yarn-1449.4.patch

[~sandyr], I've updated my code according to your comments (thanks again!). But there're some places I haven't changed.

{quote}
{code}
+    // first add all decreased containers to failed, we will support it in the
+    // future
{code}
What is it that we will support in the future?
{quote}

Originally, I planned not support container decrease, so I put all decrease requests directly to failed list. This patch included decrease requests support and such decreased containers will be added to next node heartbeat sent to RM.

{quote}
{code}
+public class ContainerChangeMonitoringEvent extends ContainersMonitorEvent {
+  private final long vmemLimit;
+  private final long pmemLimit;
{code}
No need to calculate / include the vmem here. The containers monitor should know about it.
{quote}

I think we needed them. Originally this is created by ContainerImpl.LaunchTransition, and ContainersMonitoring will not care about this, this is passed in to CsM by ContainerStartMonitoringEvent. I just follow the same way of what ContainerStartMonitoringEvent does.

{quote}
Unnecessary change
{code}
-  private static class ProcessTreeInfo {
+  static class ProcessTreeInfo {
{code}
{quote}

This is used for testing, the easiest way to test ContainersMonitoring is change pmem/vmem by setting fields in ResourceCalculatorProcessTree. We can get ProcessTreeInfo in trackingContainers of ContainersMonitor, but we cannot use this class because its private. I don't if I should propose another jira for this, but I think this can make our easier to write tests for ContainersMonitoring.

{quote}
{code}
+  public void testResourceChange() throws Exception {
{code}
It looks like this test relies on sleeps to wait for events to be handled. This both makes the test run longer and can cause flaky failures. I think there is a way to actually wait for the events to be completed. I don't know it off the top of my head but can look it up for you if that would be helpful.
{quote}

I've thinked about this but not come up with a better way to do it. This will be failed only in some extremely bad cases (like very low resource available). I've set YarnConfiguration.NM_CONTAINER_MON_INTERVAL_MS to 20ms in test, and 200ms seems enough for ContainersMonitor react to bad containers. Please let me know if you have any other ideas!

And any thoughts about this patch?

> Protocol changes and implementations in NM side to support change container resource
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-1449
>                 URL: https://issues.apache.org/jira/browse/YARN-1449
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.2.0
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch
>
>
> As described in YARN-1197, we need add API/implementation changes,
> 1) Add a "changeContainersResources" method in ContainerManagementProtocol
> 2) Can get succeed/failed increased/decreased containers in response of "changeContainersResources"
> 3) Add a "new decreased containers" field in NodeStatus which can help NM notify RM such changes
> 4) Added changeContainersResources implementation in ContainerManagerImpl
> 5) Added changes in ContainersMonitorImpl to support change resource limit of containers


--
This message was sent by Atlassian JIRA
(v6.1.4#6159)