hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3733) On RM restart AM getting more than maximum possible memory when many tasks in queue
Date Thu, 28 May 2015 10:10:17 GMT

    [ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562622#comment-14562622
] 

Rohith commented on YARN-3733:
------------------------------

Verified the RM logs from [~bibinchundatt] offline. The sequence of events ocured are 
# 30 applications are submitted to RM1 concurrently. *pendingApplications=18 and activeApplications=12*.
Active applications are started RUNNING state.
# RM1 switched to standby, RM2 transitioned to Active state. Currently active RM is RM2.
# Previous submitted 30 applications started recovering. As part of recovery process, all
the 30 applications submitted to schedulers and all these applications become active i.e *activeApplications=30
and pendingApplications=0* which is not expected to happen.
# NM registered with RM and running AM's registered with RM.
# Since 30 applications are activated, schedulers tries to launch all the activated applications
ApplicatonMater and occupied full cluster capacity.

Basically the issue AM limit check in LeafQueue#activateApplications is not working as expected
for {{DominantResourceAllocator}}. In order to confirm this, written simple program for both
Default and Dominant resource allocator like below memory configurations. Output of the program
is 
For DefaultResourceAllocator, result is false which Limits the applications being activated
when AM resource Limit is exceeded.
For DominatReosurceAllocator, result is true  which allows all the applications to be activated
even if AM resource Limit is exceeded.
{noformat}
2015-05-28 14:00:52,704 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
application AMResource <memory:4096, vCores:1> maxAMResourcePerQueuePercent 0.5 amLimit
<memory:0, vCores:0> lastClusterResource <memory:0, vCores:0> amIfStarted <memory:4096,
vCores:1>
{noformat}

{code}
package com.test.hadoop;

import org.apache.hadoop.yarn.api.records.Resource;
import org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator;
import org.apache.hadoop.yarn.util.resource.DominantResourceCalculator;
import org.apache.hadoop.yarn.util.resource.ResourceCalculator;
import org.apache.hadoop.yarn.util.resource.Resources;

public class TestResourceCalculator {

  public static void main(String[] args) {
    // Default Resource Allocator
    ResourceCalculator defaultResourceCalculator =
        new DefaultResourceCalculator();

    // Dominant Resource Allocator
    ResourceCalculator dominantResourceCalculator =
        new DominantResourceCalculator();

    Resource lastClusterResource = Resource.newInstance(0, 0);
    Resource amIfStarted = Resource.newInstance(4096, 1);
    Resource amLimit = Resource.newInstance(0, 0);

   // expected result false, but actual also false
    System.out.println("DefaultResourceCalculator : "
        + Resources.lessThanOrEqual(defaultResourceCalculator,
            lastClusterResource, amIfStarted, amLimit));

   // expected result false, but actual also true for DominantResourceAllocator
    System.out.println("DominantResourceCalculator : "
        + Resources.lessThanOrEqual(dominantResourceCalculator,
            lastClusterResource, amIfStarted, amLimit));

  }
}

{code}

>  On RM restart AM getting more than maximum possible memory when many  tasks in queue
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-3733
>                 URL: https://issues.apache.org/jira/browse/YARN-3733
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.0
>         Environment: Suse 11 Sp3 , 2 NM , 2 RM
> one NM - 3 GB 6 v core
>            Reporter: Bibin A Chundatt
>            Assignee: Rohith
>            Priority: Critical
>
> Steps to reproduce
> =================
> 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
> 2. Configure map and reduce size to 512 MB  after changing scheduler minimum size to
512 MB
> 3. Configure capacity scheduler and AM limit to .5 (DominantResourceCalculator is configured)
> 4. Submit 30 concurrent task 
> 5. Switch RM
> Actual
> =====
> For 12 Jobs AM gets allocated and all 12 starts running
> No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
> Expected
> =======
> Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message