hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-8569) Create an interface to provide cluster information to application
Date Fri, 12 Oct 2018 00:44:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647030#comment-16647030
] 

Eric Yang edited comment on YARN-8569 at 10/12/18 12:43 AM:
------------------------------------------------------------

[~leftnoteasy] . Patch 12 includes set_user fix, and added sysfs path name to nmPrivate/[appId]/sysfs.
 In the current implementation, the sync will make n rest api call to node managers where
n is less than total number of bare metal host that hosting containers.  This reduced required
network traffic to keep information in sync.  With the introduced sysfs prefix in nmPrivate
directory, it will pave ways to add more than just app.json to sysfs directory and prevent
path traversal attack.  

When information is populated into other files, then there is higher chance of race condition,
where state is changed in some files but not all files.  Multiple files population mechanism
will require more thoughts to keep the information transactional.  The first version does
not add container id, or arbitrary filename support to reduce the transaction commits and
ensure the information propagation is idempotent from container point of view.


was (Author: eyang):
[~leftnoteasy] . Patch 12 includes set_user fix, and added sysfs path name to nmPrivate/[appId]/sysfs.
 In the current implementation, the sync will make n rest api call to node managers where
n is less than total number of bare metal host that hosting containers.  This reduced required
network traffic to keep information in sync.  With the introduced sysfs prefix in nmPrivate
directory, it will pave ways to add more than just app.json to sysfs directory and prevent
path traversal attack.  

When information is populated into other files, then there is higher chance of race condition,
where state is changed in some files but not all files.  Multiple files population mechanism
will require more thoughts to keep the information transactional.  The first version does
not add add container id, or arbitrary filename support to reduce the transaction commits
and ensure the information propagation is idempotent from container point of view.

> Create an interface to provide cluster information to application
> -----------------------------------------------------------------
>
>                 Key: YARN-8569
>                 URL: https://issues.apache.org/jira/browse/YARN-8569
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Major
>              Labels: Docker
>         Attachments: YARN-8569 YARN sysfs interface to provide cluster information to
application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, YARN-8569.003.patch, YARN-8569.004.patch,
YARN-8569.005.patch, YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, YARN-8569.009.patch,
YARN-8569.010.patch, YARN-8569.011.patch, YARN-8569.012.patch
>
>
> Some program requires container hostnames to be known for application to run.  For example,
distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN services launch_command.
 In addition, the dynamic parameters do not work with YARN flex command.  This is the classic
pain point for application developer attempt to automate system environment settings as parameter
to end user application.
> It would be great if YARN Docker integration can provide a simple option to expose hostnames
of the yarn service via a mounted file.  The file content gets updated when flex command is
performed.  This allows application developer to consume system environment settings via a
standard interface.  It is like /proc/devices for Linux, but for Hadoop.  This may involve
updating a file in distributed cache, and allow mounting of the file via container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message