hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1127) reservation exchange and excess reservation is not working for capacity scheduler
Date Fri, 30 Aug 2013 20:38:52 GMT

    [ https://issues.apache.org/jira/browse/YARN-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755110#comment-13755110
] 

Bikas Saha commented on YARN-1127:
----------------------------------

Then please clarify this in the description or comment. Otherwise it looked like an exact
duplicate. So the purpose of this jira is to fix the following situation.
1) NM1 has 2048 capacity in total but only 512 is free. A reservation of 1024 is placed on
it
2) NM2 now reports 1024 free space. At this point, the above reservation should be removed
from NM1 and container should be assigned to NM2.
Step 2 is not happening and this jira intends to fix it.
                
> reservation exchange and excess reservation is not working for capacity scheduler
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-1127
>                 URL: https://issues.apache.org/jira/browse/YARN-1127
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.1.1-beta
>            Reporter: Omkar Vinit Joshi
>            Assignee: Omkar Vinit Joshi
>            Priority: Blocker
>
> I have 2 node managers.
> * one with 1024 MB memory.(nm1)
> * second with 2048 MB memory.(nm2)
> I am submitting simple map reduce application with 1 mapper and one reducer with 1024mb
each. The steps to reproduce this are
> * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's heartbeat
doesn't reach RM first).
> * now submit application. As soon as it receives first node's (nm1) heartbeat it will
try to reserve memory for AM-container (2048MB). However it has only 1024MB of memory.
> * now start nm2 with 2048 MB memory.
> It hangs forever... Ideally this has two potential issues.
> * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available memory. In this
case if the original request was made without any locality then scheduler should unreserve
memory on nm1 and allocate requested 2048MB container on nm2. 
> * We support a notion where if say we have 5 nodes with 4 AM and all node managers have
8GB each and AM 2 GB each. Each AM is requesting 8GB each. Now to avoid deadlock AM will make
an extra reservation. By doing this we would never hit the deadlock situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message