hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-7999) Docker launch fails when user private filecache directory is missing
Date Mon, 12 Mar 2018 23:13:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16396223#comment-16396223
] 

Eric Yang edited comment on YARN-7999 at 3/12/18 11:12 PM:
-----------------------------------------------------------

[~jlowe] I am getting this error:

{code}
Exception message: Invalid docker rw mount '/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1520895272530_0001/container_1520895272530_0001_01_000005:/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1520895272530_0001/container_1520895272530_0001_01_000005',
realpath=/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1520895272530_0001/container_1520895272530_0001_01_000005
Error constructing docker command, docker error code=14, error message='Invalid docker read-write
mount'

Shell output: main : command provided 4
main : run as user is hbase
main : requested yarn user is hbase
Creating script paths...
Creating local dirs...


[2018-03-12 22:57:31.027]Diagnostic message from attempt 0 : [2018-03-12 22:57:31.027]
[2018-03-12 22:57:31.027]Container exited with a non-zero exit code 29. 
{code}

The container logging directory is not available when docker tries to bind mount the logging
directory.  I also found something interesting that if one of the cluster node's docker is
not working properly.  The container attempted on the faulty node, and initialized logging
directory on the faulty node.  When the same attempt is started on other nodes, it does not
initialize logging directory on other node which leads to the failure.


was (Author: eyang):
[~jlowe] I am getting this error:

{code}
Exception message: Invalid docker rw mount '/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1520895272530_0001/container_1520895272530_0001_01_000005:/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1520895272530_0001/container_1520895272530_0001_01_000005',
realpath=/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1520895272530_0001/container_1520895272530_0001_01_000005
Error constructing docker command, docker error code=14, error message='Invalid docker read-write
mount'

Shell output: main : command provided 4
main : run as user is hbase
main : requested yarn user is hbase
Creating script paths...
Creating local dirs...


[2018-03-12 22:57:31.027]Diagnostic message from attempt 0 : [2018-03-12 22:57:31.027]
[2018-03-12 22:57:31.027]Container exited with a non-zero exit code 29. 
{code}

The container logging directory is not available when docker tries to bind mount the logging
directory.  I also found something interesting that if one of the cluster node's docker is
not working properly.  The container attempt on the faulty node, and initialized logging directory
on the faulty node.  When the same attempt is started on other nodes, it does not initialize
logging directory on other node which leads to the failure.

> Docker launch fails when user private filecache directory is missing
> --------------------------------------------------------------------
>
>                 Key: YARN-7999
>                 URL: https://issues.apache.org/jira/browse/YARN-7999
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Eric Yang
>            Assignee: Jason Lowe
>            Priority: Major
>         Attachments: YARN-7999.001.patch, YARN-7999.002.patch
>
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_000020]: [2018-03-02 23:26:09.196]Exception
from container-launch.
> Container id: container_1520032931921_0001_01_000020
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error message='Invalid docker
mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_000020//container_1520032931921_0001_01_000020.pid
in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message