hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
Date Mon, 09 Apr 2018 21:38:00 GMT
Wangda Tan created YARN-8135:
--------------------------------

             Summary: Hadoop {Submarine} Project: Simple and scalable deployment of deep learning
training / serving jobs on Hadoop
                 Key: YARN-8135
                 URL: https://issues.apache.org/jira/browse/YARN-8135
             Project: Hadoop YARN
          Issue Type: New Feature
            Reporter: Wangda Tan
            Assignee: Wangda Tan
         Attachments: image-2018-04-09-14-35-16-778.png

Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can take human to deep places. B-)

Compare to other projects:

!image-2018-04-09-14-35-16-778.png!

*Notes:*

* GPU Isolation of XLearning project is achieved by patched YARN, which is different from
community’s GPU isolation solution.

** XLearning needs few modification to read ClusterSpec from env.

*References:*

- TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark
- TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN
- Spark Deep Learning (Databricks): https://github.com/databricks/spark-deep-learning
- XLearning (Qihoo360): https://github.com/Qihoo360/XLearning
- Kubeflow (Google): https://github.com/kubeflow/kubeflow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message