hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj K (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3733) On RM restart AM getting more than maximum possible memory when many tasks in queue
Date Fri, 29 May 2015 05:16:17 GMT

    [ https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564218#comment-14564218
] 

Devaraj K commented on YARN-3733:
---------------------------------

Thanks [~bibinchundatt], [~rohithsharma] and [~sunilg] for reporting and fixing, appreciate
your efforts.

Some comments on the patch.

1. 
{code:xml}
+    if (Float.isNaN(l) && Float.isNaN(r)) {
+      return 0;
+    } else if (Float.isNaN(l)) {
+      return -1;
+    } else if (Float.isNaN(r)) {
+      return 1;
+    }
+
+    // TODO what if both l and r infinity? Should infinity compared? how?
+
{code}
Here l and r are getting derived from lhs, rhs and clusterResource which are not infinite.
Can we check for lhs/rhs emptiness and compare these before ending up with infinite values?


2. The newly added code is duplicated in two places, can you eliminate the duplicate code?

3. In the Test class, Can you add the message for all assertEquals() using this API.
{code:xml}
Assert.assertEquals(String message, expected, actual)
{code}


>  On RM restart AM getting more than maximum possible memory when many  tasks in queue
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-3733
>                 URL: https://issues.apache.org/jira/browse/YARN-3733
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.0
>         Environment: Suse 11 Sp3 , 2 NM , 2 RM
> one NM - 3 GB 6 v core
>            Reporter: Bibin A Chundatt
>            Assignee: Rohith
>            Priority: Blocker
>         Attachments: YARN-3733.patch
>
>
> Steps to reproduce
> =================
> 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
> 2. Configure map and reduce size to 512 MB  after changing scheduler minimum size to
512 MB
> 3. Configure capacity scheduler and AM limit to .5 (DominantResourceCalculator is configured)
> 4. Submit 30 concurrent task 
> 5. Switch RM
> Actual
> =====
> For 12 Jobs AM gets allocated and all 12 starts running
> No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
> Expected
> =======
> Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message