hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7999) Docker launch fails when user private filecache directory is missing
Date Tue, 13 Mar 2018 20:22:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397592#comment-16397592

Jason Lowe commented on YARN-7999:

Thanks for the logs!  I don't see how this could be a race, the container executor is not
multithreaded and doesn't start running the docker command for a container before it has completed
creating the directories for the container.

>From the logs it looks like we never got around to running a docker command at all, rather
the mount security checks within the container executor are failing.  The "Creating local
dirs..." log implies that the local directories (including log directories per my previous
comment) are being created, and that's just before it tries to construct the docker run command
which checks the mount permissions.

I don't see an error like "Could not determine real path of mount" or "Could not stat path"
in the launch logs, so I'm guessing the log directory is actually being created.  You could
try setting yarn.nodemanager.delete.debug-delay-sec to a large enough value to facilitate
verifying the log directory is actually there.  Given it's not complaining about being unable
to stat the mount path before complaining about it, I suspect it is there.  That leads me
to believe that it doesn't think that path is allowed rather than not there, which implies
it is either missing from the whitelisted paths in the container executor config or maybe
something is wrong with YARN-7626 which did recently go into trunk.

> Docker launch fails when user private filecache directory is missing
> --------------------------------------------------------------------
>                 Key: YARN-7999
>                 URL: https://issues.apache.org/jira/browse/YARN-7999
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Eric Yang
>            Assignee: Jason Lowe
>            Priority: Major
>         Attachments: YARN-7999.001.patch, YARN-7999.002.patch, q3.log
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_000020]: [2018-03-02 23:26:09.196]Exception
from container-launch.
> Container id: container_1520032931921_0001_01_000020
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
> Error constructing docker command, docker error code=12, error message='Invalid docker
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_000020//container_1520032931921_0001_01_000020.pid
in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down now...
> {code}
> The filecache cant not be mounted because it doesn't exist.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message