hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Badger (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
Date Wed, 03 Jan 2018 21:40:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310327#comment-16310327
] 

Eric Badger commented on YARN-7677:
-----------------------------------

bq. Some Hadoop features will not work, i.e. short circuits read, if host and docker containers
are not matching.
This is true. I would like to work towards a solution where we use something similar {{dfs.domain.socket.path}},
since it already defines the short-circuit socket. However, I'm not sure how to do that without
copying the config, since this is a dfs property that will be used by the datanode (i.e. not
the container-executor).

bq. If we handle security properly with white list mount (YARN-5534), container-executor validation
(YARN-7590), and check sudo privileges before launching privileged container (YARN-7221).
Any particular reason that we shouldn't allow read-only bind-mount HADOOP_CONF_DIR?
Nope, I don't think there is any problem with bind-mounting {{HADOOP_CONF_DIR}}. However,
I don't think it should be a requirement. For example, you should be able to use an older
version of hadoop as the client (task), while the server (NM) uses a newer version. If we
pass in {{HADOOP_CONF_DIR}} then this is not possible. If we are constantly bind-mounting
in hadoop to all of the containers, then we lose some of the wonder of docker, which is that
the container stays constant and consistent over time. Some may choose to bind-mount hadoop,
but it should be a choice, not a requirement

bq. White list is used by container-executor, which resides in host, and not docker container.
How is the by pass happens?
This happens because of a call in {{ContainerLaunch.java}} that automatically adds {{HADOOP_CONF_DIR}}
to the environment. This environment is parsed in {{launch_container.sh}}, which is the script
that the docker container is started with.

{noformat:title=ContainerLaunch.sh}
1388    putEnvIfAbsent(environment, Environment.HADOOP_CONF_DIR.name());
{noformat}

> HADOOP_CONF_DIR should not be automatically put in task environment
> -------------------------------------------------------------------
>
>                 Key: YARN-7677
>                 URL: https://issues.apache.org/jira/browse/YARN-7677
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>
> Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether it's set
by the user or not. It completely bypasses the whitelist and so there is no way for a task
to not have {{HADOOP_CONF_DIR}} set. This causes problems in the Docker use case where Docker
containers will set up their own environment and have their own {{HADOOP_CONF_DIR}} preset
in the image itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message