hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5290) ResourceManager can place more containers on a node than the node size allows
Date Thu, 23 Jun 2016 11:40:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346297#comment-15346297

Jun Gong commented on YARN-5290:

Thanks [~jlowe] for reporting the issue!

We came across the issue some time ago. I tried the thought in YARN-4148: RM does not release
app's resource until containers actually finish and NM releases the resource.

Another thought(copied from YARN-4148): NM records its total resource and available resource.
When launching a container, NM checks available resource and waits until there is enough resource
for container. But there might be a time gap from AM's perspective, AM thinks it has launched
container, however container might be waiting for its resource.

> ResourceManager can place more containers on a node than the node size allows
> -----------------------------------------------------------------------------
>                 Key: YARN-5290
>                 URL: https://issues.apache.org/jira/browse/YARN-5290
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Jason Lowe
> When the ResourceManager or an ApplicationMaster kills a container the RM scheduler instantly
thinks the container is dead and frees those resources within the scheduler bookkeeping. 
However that container can still be running on the node until the node heartbeats back into
the RM and is told to kill the container.  If the RM allocates the space associated with the
released container and gives it to an AM quickly enough, the AM can launch a new container
while the old container is still running on the NM.  That leads to a scenario where we're
technically running more resources on the node than the node advertised to the RM.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message