hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
Date Wed, 10 Oct 2018 17:43:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645324#comment-16645324
] 

Eric Yang commented on YARN-8569:
---------------------------------

{quote}If you will do #1, IIUC, the #2 has to be done, otherwise as a normal user we cannot
read from nmPrivate dir. {quote}

Line 2632-2638 of container-executor does exactly as open file as node manager user, and it
already drop privileges while writing the file.  I am not disagree with what you said, but
it is already part of the patch.

{quote}We already limit to read file under .../nmPrivate/app../sys/fs/<file> correct?
How it is possible to read token file from that directory?{quote}

App.json is writing to nmPrivate/[appid]/app.json, there is no sub-directory in nmPrivate/[appid]/sysfs.
 Sysfs directory only exists in distributed cache location.  Good security practice is to
make generated paths have hard coded depth and fix filename to prevent path exploits.  If
we allow API to specify arbitrary filename, user can easy specify "sysfs/../container_tokens"
from rest api to defeat security measurement that were put in place.  Readlink can flatten
the path, but there is no real indicator that source can be copied.  For now, I can do source
path check to make sure it is jailed in nmPrivate/[appid]/sysfs path.  However, the first
version of API will not accept arbitrary filename to minimize security concerns.  It would
be easy to extend the REST API method with support of arbitrary filename when more thoughts
are given.  I leave that responsibility to the one that wants to bare the problem.

> Create an interface to provide cluster information to application
> -----------------------------------------------------------------
>
>                 Key: YARN-8569
>                 URL: https://issues.apache.org/jira/browse/YARN-8569
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Major
>              Labels: Docker
>         Attachments: YARN-8569 YARN sysfs interface to provide cluster information to
application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, YARN-8569.003.patch, YARN-8569.004.patch,
YARN-8569.005.patch, YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, YARN-8569.009.patch,
YARN-8569.010.patch, YARN-8569.011.patch
>
>
> Some program requires container hostnames to be known for application to run.  For example,
distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN services launch_command.
 In addition, the dynamic parameters do not work with YARN flex command.  This is the classic
pain point for application developer attempt to automate system environment settings as parameter
to end user application.
> It would be great if YARN Docker integration can provide a simple option to expose hostnames
of the yarn service via a mounted file.  The file content gets updated when flex command is
performed.  This allows application developer to consume system environment settings via a
standard interface.  It is like /proc/devices for Linux, but for Hadoop.  This may involve
updating a file in distributed cache, and allow mounting of the file via container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message