hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1274) LCE fails to run containers that don't have resources to localize
Date Sat, 05 Oct 2013 03:08:42 GMT

    [ https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786903#comment-13786903

Alejandro Abdelnur commented on YARN-1274:

[~vinodkv], I was planning to take a stab at it next week, if you are in a rush go for it.

As some background:

* My current workaround from the AM side, create a dummy LocalResource against file:///etc/hosts
as application private. This triggers localization per app in the node just once and using
a local file I don't incur into any unnecessary extra latency with HDFS.

* Possible solution 1: trigger resource localization always to force the LCE localization
and ensure creation of the usercache/USER even if there are not application/private resources
to localize.

* Possible solution 2: the LCE launcher should call mkdirat at usercache/USER and do the chmod
before launching the container process, if the dir already exists because of localization
this is a NOP. the mkdirat happens before doing the setuid to launch the container process.

I prefer option 2 because it will avoid triggering the localization thread and it will avoid
adding extra latency to containers without localization.

> LCE fails to run containers that don't have resources to localize
> -----------------------------------------------------------------
>                 Key: YARN-1274
>                 URL: https://issues.apache.org/jira/browse/YARN-1274
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.1.1-beta
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Blocker
> LCE container launch assumes the usercache/USER directory exists and it is owned by the
user running the container process.
> But the directory is created only if there are resources to localize by the LCE localization
command, if there are not resourcdes to localize, LCE localization never executes and launching
fails reporting 255 exit code and the NM logs have something like:
> {code}
> 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
main : command provided 1
> 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
main : user is llama
> 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Can't create directory llama in /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_000004
- Permission denied
> {code}

This message was sent by Atlassian JIRA

View raw message