hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples
Date Fri, 01 Jun 2018 16:33:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498199#comment-16498199
] 

Eric Yang commented on YARN-8220:
---------------------------------

[~sunilg] Thank you for the patch, a couple suggestions:

1. Avoid using bash style launch command.  Although this is kind of working, but it greatly
improves security and readability to use ENTRYPOINT, and CMD in Dockerfile.  For example:

{code}
WORKDIR /test/models/tutorials/image/cifar10_estimator 
ENTRYPOINT ["/usr/bin/python", "cifar10_main.py"]
CMD ["--data-dir=hdfs:///tmp/cifar-10-data"]
CMD ["--job-dir=hdfs:///tmp/cifar-10-jobdir"]
CMD ["--train-steps=10000"]
CMD ["--eval-batch-size=16"]
CMD ["--train-batch-size=16"]
CMD ["--sync"]
CMD ["--num-gpus=2"]
{code}

This simplifies yarnfile, and prevent to run the script in wrong directory if working directory
doesn't exist.

2. It might be good to show case some yarnfile features:

{code}
{
..
  "configuration": {
    "env": {
      "YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS":"/etc/hadoop/conf:/etc/hadoop/conf:ro",
      "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true"
    }
  }
..
}
{code}

This helps to show case how to mount configuration files from host disks, and use ENTRYPOINT
support.

3. Downloading source code from individual github contributors might be risky and prone to
break.  If the source is small enough and donated to Apache, it would be better to host them
locally.

> Running Tensorflow on YARN with GPU and Docker - Examples
> ---------------------------------------------------------
>
>                 Key: YARN-8220
>                 URL: https://issues.apache.org/jira/browse/YARN-8220
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn-native-services
>            Reporter: Sunil Govindan
>            Assignee: Sunil Govindan
>            Priority: Critical
>         Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message