hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sun...@apache.org
Subject [hadoop] branch submarine-0.2 updated: SUBMARINE-82. Fix english grammar mistakes in documentation. Contributed by Szilard Nemeth.
Date Tue, 04 Jun 2019 06:47:26 GMT
This is an automated email from the ASF dual-hosted git repository.

sunilg pushed a commit to branch submarine-0.2
in repository https://gitbox.apache.org/repos/asf/hadoop.git


The following commit(s) were added to refs/heads/submarine-0.2 by this push:
     new c177cc9  SUBMARINE-82. Fix english grammar mistakes in documentation. Contributed
by Szilard Nemeth.
c177cc9 is described below

commit c177cc97508743f7e112876a97280554c01813a4
Author: Zhankun Tang <ztang@apache.org>
AuthorDate: Tue Jun 4 14:44:37 2019 +0800

    SUBMARINE-82. Fix english grammar mistakes in documentation. Contributed by Szilard Nemeth.
    
    (cherry picked from commit 799115967d6e1a4074d0186b06b4eb97251a19df)
---
 .../src/site/markdown/Examples.md                  |  2 +-
 .../src/site/markdown/HowToInstall.md              | 24 +++----
 .../src/site/markdown/Index.md                     | 14 ++--
 .../src/site/markdown/InstallationGuide.md         | 79 +++++++++++++---------
 .../src/site/markdown/QuickStart.md                | 29 ++++----
 .../markdown/RunningDistributedCifar10TFJobs.md    | 16 ++---
 .../src/site/markdown/TestAndTroubleshooting.md    |  8 +--
 7 files changed, 95 insertions(+), 77 deletions(-)

diff --git a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Examples.md b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Examples.md
index b66b32d..fd61e83 100644
--- a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Examples.md
+++ b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Examples.md
@@ -14,7 +14,7 @@
 
 # Examples
 
-Here're some examples about Submarine usage.
+Here are some examples about how to use Submarine:
 
 [Running Distributed CIFAR 10 Tensorflow Job](RunningDistributedCifar10TFJobs.html)
 
diff --git a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/HowToInstall.md b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/HowToInstall.md
index 65e56ea..af96d6d 100644
--- a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/HowToInstall.md
+++ b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/HowToInstall.md
@@ -14,23 +14,23 @@
 
 # How to Install Dependencies
 
-Submarine project uses YARN Service, Docker container, and GPU (when GPU hardware available
and properly configured).
+Submarine project uses YARN Service, Docker container and GPU.
+GPU could only be used if a GPU hardware is available and properly configured.
 
-That means as an admin, you have to properly setup YARN Service related dependencies, including:
+As an administrator, you have to properly setup YARN Service related dependencies, including:
 - YARN Registry DNS
+- Docker related dependencies, including:
+  - Docker binary with expected versions
+  - Docker network that allows Docker containers to talk to each other across different nodes
 
-Docker related dependencies, including:
-- Docker binary with expected versions.
-- Docker network which allows Docker container can talk to each other across different nodes.
+If you would like to use GPU, you need to set up:
+- GPU Driver
+- Nvidia-docker
 
-And when GPU wanna to be used:
-- GPU Driver.
-- Nvidia-docker.
-
-For your convenience, we provided installation documents to help you to setup your environment.
You can always choose to have them installed in your own way.
+For your convenience, we provided some installation documents to help you setup your environment.
You can always choose to have them installed in your own way.
 
 Use Submarine installer to install dependencies: [EN](https://github.com/hadoopsubmarine/hadoop-submarine-ecosystem/tree/master/submarine-installer)
[CN](https://github.com/hadoopsubmarine/hadoop-submarine-ecosystem/blob/master/submarine-installer/README-CN.md)
 
-Alternatively, you can follow manual install dependencies: [EN](InstallationGuide.html) [CN](InstallationGuideChineseVersion.html)
+Alternatively, you can follow this guide to manually install dependencies: [EN](InstallationGuide.html)
[CN](InstallationGuideChineseVersion.html)
 
-Once you have installed dependencies, please follow following guide to [TestAndTroubleshooting](TestAndTroubleshooting.html).
 
\ No newline at end of file
+Once you have installed all the dependencies, please follow this guide: [TestAndTroubleshooting](TestAndTroubleshooting.html).
\ No newline at end of file
diff --git a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Index.md b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Index.md
index d11fa45..e2c7979 100644
--- a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Index.md
+++ b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Index.md
@@ -21,20 +21,20 @@ Goals of Submarine:
 
 - Can launch services to serve Tensorflow/MXNet models.
 
-- Support run distributed Tensorflow jobs with simple configs.
+- Supports running distributed Tensorflow jobs with simple configs.
 
-- Support run standalone PyTorch jobs with simple configs.
+- Supports running standalone PyTorch jobs with simple configs.
 
-- Support run user-specified Docker images.
+- Supports running user-specified Docker images.
 
-- Support specify GPU and other resources.
+- Supports specifying GPU and other resources.
 
-- Support launch tensorboard for training jobs if user specified.
+- Supports launching Tensorboard for training jobs (optional, if specified).
 
-- Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
+- Supports customized DNS name for roles (like tensorboard.$user.$domain:6006)
 
 
-Click below contents if you want to understand more.
+If you want to deep-dive, please check these resources:
 
 - [QuickStart Guide](QuickStart.html)
 
diff --git a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/InstallationGuide.md
b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/InstallationGuide.md
index e73887e..73bb9bd 100644
--- a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/InstallationGuide.md
+++ b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/InstallationGuide.md
@@ -16,20 +16,25 @@
 
 ## Prerequisites
 
-(Please note that all following prerequisites are just an example for you to install. You
can always choose to install your own version of kernel, different users, different drivers,
etc.).
+Please note that the following prerequisites are just an example for you to install Submarine.
+
+You can always choose to install your own version of kernel, different users, different drivers,
etc.
 
 ### Operating System
 
-The operating system and kernel versions we have tested are as shown in the following table,
which is the recommneded minimum required versions.
+The operating system and kernel versions we have tested against are shown in the following
table.
+The versions in the table are the recommended minimum required versions.
 
-| Enviroment | Verion |
+| Environment | Version |
 | ------ | ------ |
 | Operating System | centos-release-7-5.1804.el7.centos.x86_64 |
-| Kernal | 3.10.0-862.el7.x86_64 |
+| Kernel | 3.10.0-862.el7.x86_64 |
 
 ### User & Group
 
-As there are some specific users and groups recommended to be created to install hadoop/docker.
Please create them if they are missing.
+There are specific users and groups recommended to be created to install Hadoop with Docker.
+
+Please create these users if they do not exist.
 
 ```
 adduser hdfs
@@ -80,7 +85,9 @@ lspci | grep -i nvidia
 
 ### Nvidia Driver Installation (Only for Nvidia GPU equipped nodes)
 
-To make a clean installation, if you have requirements to upgrade GPU drivers. If nvidia
driver/cuda has been installed before, They should be uninstalled firstly.
+To make a clean installation, if you have requirements to upgrade GPU drivers.
+
+If nvidia driver / CUDA has been installed before, they should be uninstalled as a first
step.
 
 ```
 # uninstall cuda:
@@ -90,7 +97,7 @@ sudo /usr/local/cuda-10.0/bin/uninstall_cuda_10.0.pl
 sudo /usr/bin/nvidia-uninstall
 ```
 
-To check GPU version, install nvidia-detect
+To check GPU version, install nvidia-detect:
 
 ```
 yum install nvidia-detect
@@ -107,7 +114,9 @@ Pay attention to `This device requires the current xyz.nm NVIDIA driver
kmod-nvi
 Download the installer like [NVIDIA-Linux-x86_64-390.87.run](https://www.nvidia.com/object/linux-amd64-display-archive.html).
 
 
-Some preparatory work for nvidia driver installation. (This is follow normal Nvidia GPU driver
installation, just put here for your convenience)
+Some preparatory work for Nvidia driver installation.
+
+The steps below are for Nvidia GPU driver installation, just pasted here for your convenience.
 
 ```
 # It may take a while to update
@@ -152,7 +161,7 @@ Would you like to run the nvidia-xconfig utility to automatically update
your X
 ```
 
 
-Check nvidia driver installation
+Check Nvidia driver installation
 
 ```
 nvidia-smi
@@ -165,7 +174,7 @@ https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
 
 ### Docker Installation
 
-The following steps show how to install docker 18.06.1.ce. You can choose other approaches
to install Docker.
+The following steps show you how to install docker 18.06.1.ce. You can choose other approaches
to install Docker.
 
 ```
 # Remove old version docker
@@ -205,7 +214,9 @@ Reference:https://docs.docker.com/install/linux/docker-ce/centos/
 
 ### Docker Configuration
 
-Add a file, named daemon.json, under the path of /etc/docker/. Please replace the variables
of image_registry_ip, etcd_host_ip, localhost_ip, yarn_dns_registry_host_ip, dns_host_ip with
specific ips according to your environments.
+Add a file, named daemon.json, under the path of /etc/docker/.
+
+Please replace the variables of image_registry_ip, etcd_host_ip, localhost_ip, yarn_dns_registry_host_ip,
dns_host_ip with specific IPs according to your environment.
 
 ```
 {
@@ -294,7 +305,7 @@ import tensorflow as tf
 tf.test.is_gpu_available()
 ```
 
-The way to uninstall nvidia-docker V2
+If you want to uninstall nvidia-docker V2:
 ```
 sudo yum remove -y nvidia-docker2-2.0.3-1.docker18.06.1.ce
 ```
@@ -304,12 +315,14 @@ https://github.com/NVIDIA/nvidia-docker
 
 ### Tensorflow Image
 
-There is no need to install CUDNN and CUDA on the servers, because CUDNN and CUDA can be
added in the docker images. We can get basic docker images by referring to [Write Dockerfile](WriteDockerfileTF.html).
+There is no need to install CUDNN and CUDA on the servers, because CUDNN and CUDA can be
added in the docker images.
+
+We can get or build basic docker images by referring to [Write Dockerfile](WriteDockerfileTF.html).
 
 ### Test tensorflow in a docker container
 
 After docker image is built, we can check
-Tensorflow environments before submitting a yarn job.
+Tensorflow environments before submitting a Submarine job.
 
 ```shell
 $ docker run -it ${docker_image_name} /bin/bash
@@ -336,8 +349,8 @@ If there are some errors, we could check the following configuration.
 
 ### Etcd Installation
 
-etcd is a distributed reliable key-value store for the most critical data of a distributed
system, Registration and discovery of services used in containers.
-You can also choose alternatives like zookeeper, Consul.
+etcd is a distributed, reliable key-value store for the most critical data of a distributed
system, Registration and discovery of services used in containers.
+You can also choose alternatives like ZooKeeper, Consul or others.
 
 To install Etcd on specified servers, we can run Submarine-installer/install.sh
 
@@ -366,8 +379,10 @@ b3d05464c356441a: name=etcdnode1 peerURLs=http://${etcd_host_ip3}:2380
clientURL
 
 ### Calico Installation
 
-Calico creates and manages a flat three-tier network, and each container is assigned a routable
ip. We just add the steps here for your convenience.
-You can also choose alternatives like Flannel, OVS.
+Calico creates and manages a flat three-tier network, and each container is assigned a routable
IP address.
+
+We are listing the steps here for your convenience.
+You can also choose alternatives like Flannel, OVS or others.
 
 To install Calico on specified servers, we can run Submarine-installer/install.sh
 
@@ -379,7 +394,7 @@ systemctl status calico-node.service
 #### Check Calico Network
 
 ```shell
-# Run the following command to show the all host status in the cluster except localhost.
+# Run the following command to show all host status in the cluster except localhost.
 $ calicoctl node status
 Calico process is running.
 
@@ -412,7 +427,7 @@ docker exec workload-A ping workload-B
 You can either get Hadoop release binary or compile from source code. Please follow the https://hadoop.apache.org/
guides.
 
 
-### Start yarn service
+### Start YARN service
 
 ```
 YARN_LOGFILE=resourcemanager.log ./sbin/yarn-daemon.sh start resourcemanager
@@ -421,7 +436,7 @@ YARN_LOGFILE=timeline.log ./sbin/yarn-daemon.sh start timelineserver
 YARN_LOGFILE=mr-historyserver.log ./sbin/mr-jobhistory-daemon.sh start historyserver
 ```
 
-### Start yarn registery dns service
+### Start YARN registry DNS service
 
 ```
 sudo YARN_LOGFILE=registrydns.log ./yarn-daemon.sh start registrydns
@@ -441,13 +456,13 @@ sudo YARN_LOGFILE=registrydns.log ./yarn-daemon.sh start registrydns
 
 #### Clean up apps with the same name
 
-Suppose we want to submit a tensorflow job named standalone-tf, destroy any application with
the same name and clean up historical job directories.
+Suppose we want to submit a TensorFlow job named standalone-tf, destroy any application with
the same name and clean up historical job directories.
 
 ```bash
 ./bin/yarn app -destroy standalone-tf
 ./bin/hdfs dfs -rmr hdfs://${dfs_name_service}/tmp/cifar-10-jobdir
 ```
-where ${dfs_name_service} is the hdfs name service you use
+where ${dfs_name_service} is the HDFS name service you use
 
 #### Run a standalone tensorflow job
 
@@ -471,7 +486,7 @@ where ${dfs_name_service} is the hdfs name service you use
 ./bin/hdfs dfs -rmr hdfs://${dfs_name_service}/tmp/cifar-10-jobdir
 ```
 
-#### Run a distributed tensorflow job
+#### Run a distributed TensorFlow job
 
 ```bash
 ./bin/yarn jar /home/hadoop/hadoop-current/share/hadoop/yarn/hadoop-yarn-submarine-3.2.0-SNAPSHOT.jar
job run \
@@ -490,11 +505,11 @@ where ${dfs_name_service} is the hdfs name service you use
 ```
 
 
-## Tensorflow Job with GPU
+## TensorFlow Job with GPU
 
-### GPU configurations for both resourcemanager and nodemanager
+### GPU configurations for both ResourceManager and NodeManager
 
-Add the yarn resource configuration file, named resource-types.xml
+Add the YARN resource configuration file, named resource-types.xml
 
    ```
    <configuration>
@@ -505,9 +520,9 @@ Add the yarn resource configuration file, named resource-types.xml
    </configuration>
    ```
 
-#### GPU configurations for resourcemanager
+#### GPU configurations for ResourceManager
 
-The scheduler used by resourcemanager must be  capacity scheduler, and yarn.scheduler.capacity.resource-calculator
in  capacity-scheduler.xml should be DominantResourceCalculator
+The scheduler used by ResourceManager must be the capacity scheduler, and yarn.scheduler.capacity.resource-calculator
in capacity-scheduler.xml should be DominantResourceCalculator
 
    ```
    <configuration>
@@ -518,7 +533,7 @@ The scheduler used by resourcemanager must be  capacity scheduler, and
yarn.sche
    </configuration>
    ```
 
-#### GPU configurations for nodemanager
+#### GPU configurations for NodeManager
 
 Add configurations in yarn-site.xml
 
@@ -536,7 +551,7 @@ Add configurations in yarn-site.xml
    </configuration>
    ```
 
-Add configurations in container-executor.cfg
+Add configurations to container-executor.cfg
 
    ```
    [docker]
@@ -560,7 +575,7 @@ Add configurations in container-executor.cfg
    yarn-hierarchy=/hadoop-yarn
    ```
 
-### Run a distributed tensorflow gpu job
+### Run a distributed TensorFlow GPU job
 
 ```bash
  ./yarn jar /home/hadoop/hadoop-current/share/hadoop/yarn/hadoop-yarn-submarine-3.2.0-SNAPSHOT.jar
job run \
diff --git a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/QuickStart.md b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/QuickStart.md
index e2df213..37991d7 100644
--- a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/QuickStart.md
+++ b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/QuickStart.md
@@ -22,9 +22,9 @@ Must:
 
 Optional:
 
-- Enable YARN DNS. (When yarn service runtime is required.)
-- Enable GPU on YARN support. (When GPU-based training is required.)
-- Docker images for Submarine jobs. (When docker container is required.)
+- Enable YARN DNS. (Only when YARN Service runtime is required)
+- Enable GPU on YARN support. (When GPU-based training is required)
+- Docker images for Submarine jobs. (When docker container is required)
 ```
   # Get prebuilt docker images (No liability)
   docker pull hadoopsubmarine/tf-1.13.1-gpu:0.0.1
@@ -121,7 +121,7 @@ usage: job run
 #### Notes:
 When using `localization` option to make a collection of dependency Python
 scripts available to entry python script in the container, you may also need to
-set `PYTHONPATH` environment variable as below to avoid module import error
+set the `PYTHONPATH` environment variable as below to avoid module import errors
 reported from `entry_script.py`.
 
 ```
@@ -137,7 +137,7 @@ reported from `entry_script.py`.
 
 ### Submarine Configuration
 
-For Submarine internal configuration, please create a `submarine.xml` which should be placed
under `$HADOOP_CONF_DIR`.
+For Submarine internal configuration, please create a `submarine.xml` file which should be
placed under `$HADOOP_CONF_DIR`.
 
 |Configuration Name | Description |
 |:---- |:---- |
@@ -157,7 +157,7 @@ yarn jar path-to/hadoop-yarn-applications-submarine-3.2.0-SNAPSHOT.jar
job run \
   --docker_image <your-docker-image> \
   --input_path hdfs://default/dataset/cifar-10-data  \
   --checkpoint_path hdfs://default/tmp/cifar-10-jobdir \
-  --worker_resources memory=4G,vcores=2,gpu=2  \
+  --worker_resources memory=4G,vcores=2,gpu=2 \
   --worker_launch_cmd "python ... (Your training application cmd)" \
   --tensorboard # this will launch a companion tensorboard container for monitoring
 ```
@@ -168,11 +168,13 @@ yarn jar path-to/hadoop-yarn-applications-submarine-3.2.0-SNAPSHOT.jar
job run \
 
 2) `DOCKER_HADOOP_HDFS_HOME` points to HADOOP_HDFS_HOME inside Docker image.
 
-3) `--worker_resources` can include gpu when you need GPU to train your task.
+3) `--worker_resources` can include GPU when you need GPU to train your task.
 
 4) When `--tensorboard` is specified, you can go to YARN new UI, go to services -> `<you
specified service>` -> Click `...` to access Tensorboard.
 
-This will launch a Tensorboard to monitor *all your jobs*. By access YARN UI (the new UI).
You can go to services page, go to the `tensorboard-service`, click quick links (`Tensorboard`)
can lead you to the tensorboard.
+This will launch Tensorboard to monitor *all your jobs*.
+By access the YARN UI (new UI), you can go to the Services page, then go to the `tensorboard-service`,
click quick links (`Tensorboard`)
+This will lead you to Tensorboard.
 
 See below screenshot:
 
@@ -229,7 +231,7 @@ java -cp /path-to/hadoop-conf:/path-to/hadoop-submarine-all-*.jar \
 
 #### Notes:
 
-1) Very similar to standalone TF application, but you need to specify #worker/#ps
+1) Very similar to standalone TF application, but you need to specify number of workers /
PS processes.
 
 2) Different resources can be specified for worker and PS.
 
@@ -283,22 +285,23 @@ java -cp /path-to/hadoop-conf:/path-to/hadoop-submarine-all-*.jar \
   --num_workers 0 --tensorboard
 ```
 
-You can view multiple job training history like from the `Tensorboard` link:
+You can view multiple job training history from the `Tensorboard` link:
 
 ![alt text](./images/multiple-tensorboard-jobs.png "Tensorboard for multiple jobs")
 
 
 ### Get component logs from a training job
 
-There're two ways to get training job logs, one is from YARN UI (new or old):
+There are two ways to get the logs of a training job.
+First, from YARN UI (new or old):
 
 ![alt text](./images/job-logs-ui.png "Job logs UI")
 
-Or you can use `yarn logs -applicationId <applicationId>` to get logs from CLI
+Alternatively, you can use `yarn logs -applicationId <applicationId>` to get logs from
CLI.
 
 ## Build from source code
 
-If you want to build the Submarine project by yourself, you can follow the steps:
+If you want to build the Submarine project by yourself, you should follow these steps:
 
 - Run 'mvn install -DskipTests' from Hadoop source top level once.
 
diff --git a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/RunningDistributedCifar10TFJobs.md
b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/RunningDistributedCifar10TFJobs.md
index c0cf088..3495c69 100644
--- a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/RunningDistributedCifar10TFJobs.md
+++ b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/RunningDistributedCifar10TFJobs.md
@@ -18,7 +18,7 @@
 
 ## Prepare data for training
 
-CIFAR-10 is a common benchmark in machine learning for image recognition. Below example is
based on CIFAR-10 dataset.
+CIFAR-10 is a common benchmark in machine learning for image recognition. The example below
is based on CIFAR-10 dataset.
 
 1) Checkout https://github.com/tensorflow/models/:
 ```
@@ -41,7 +41,7 @@ hadoop fs -put cifar-10-data/ /dataset/cifar-10-data
 
 **Warning:**
 
-Please note that YARN service doesn't allow multiple services with the same name, so please
run following command
+Please note that YARN service does not allow multiple services with the same name, so please
run following command
 ```
 yarn application -destroy <service-name>
 ```
@@ -59,8 +59,8 @@ Refer to [Write Dockerfile](WriteDockerfileTF.html) to build a Docker image
or u
 yarn jar path/to/hadoop-yarn-applications-submarine-3.2.0-SNAPSHOT.jar \
    job run --name tf-job-001 --verbose --docker_image tf-1.13.1-gpu:0.0.1 \
    --input_path hdfs://default/dataset/cifar-10-data \
-   --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/
-   --env DOCKER_HADOOP_HDFS_HOME=/hadoop-current
+   --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/ \
+   --env DOCKER_HADOOP_HDFS_HOME=/hadoop-current \
    --num_workers 1 --worker_resources memory=8G,vcores=2,gpu=1 \
    --worker_launch_cmd "cd /test/models/tutorials/image/cifar10_estimator && python
cifar10_main.py --data-dir=%input_path% --job-dir=%checkpoint_path% --train-steps=10000 --eval-batch-size=16
--train-batch-size=16 --num-gpus=2 --sync" \
    --tensorboard --tensorboard_docker_image tf-1.13.1-cpu:0.0.1
@@ -69,7 +69,7 @@ yarn jar path/to/hadoop-yarn-applications-submarine-3.2.0-SNAPSHOT.jar \
 Explanations:
 
 - When access of HDFS is required, the two environments are required to indicate: JAVA_HOME
and HDFS_HOME to access libhdfs libraries *inside Docker image*. We will try to eliminate
specifying this in the future.
-- Docker image for worker and tensorboard can be specified separately. For this case, Tensorboard
doesn't need GPU, so we will use cpu Docker image for Tensorboard. (Same for parameter-server
in the distributed example below).
+- Docker image for worker and tensorboard can be specified separately. For this case, Tensorboard
does not need GPU, so we will use the CPU Docker image for Tensorboard. (Same for parameter-server
in the distributed example below).
 
 ### Run distributed training
 
@@ -77,7 +77,7 @@ Explanations:
 yarn jar path/to/hadoop-yarn-applications-submarine-3.2.0-SNAPSHOT.jar \
    job run --name tf-job-001 --verbose --docker_image tf-1.13.1-gpu:0.0.1 \
    --input_path hdfs://default/dataset/cifar-10-data \
-   --env(s) (same as standalone)
+   --env(s) (same as standalone) \
    --num_workers 2 \
    --worker_resources memory=8G,vcores=2,gpu=1 \
    --worker_launch_cmd "cd /test/models/tutorials/image/cifar10_estimator && python
cifar10_main.py --data-dir=%input_path% --job-dir=%checkpoint_path% --train-steps=10000 --eval-batch-size=16
--train-batch-size=16 --num-gpus=2 --sync"  \
@@ -90,7 +90,7 @@ yarn jar path/to/hadoop-yarn-applications-submarine-3.2.0-SNAPSHOT.jar \
 Explanations:
 
 - `>1` num_workers indicates it is a distributed training.
-- Parameters / resources / Docker image of parameter server can be specified separately.
For many cases, parameter server doesn't require GPU.
+- Parameters / resources / Docker image of parameter server can be specified separately.
For many cases, parameter server does not require GPU.
 
 For the meaning of the individual parameters, see the [QuickStart](QuickStart.html) page!
 
@@ -150,7 +150,7 @@ INFO:tensorflow:Average examples/sec: 54.1082 (55.2134), step = 50
 INFO:tensorflow:Average examples/sec: 54.3141 (55.3676), step = 60
 ```
 
-Sample output of ps:
+Sample output of PS:
 ```
 ...
 , '_tf_random_seed': None, '_task_type': u'ps', '_environment': u'cloud', '_is_chief': False,
'_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f4be54dff90>,
'_tf_config': gpu_options {
diff --git a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/TestAndTroubleshooting.md
b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/TestAndTroubleshooting.md
index 8fd43f3..3231aaf 100644
--- a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/TestAndTroubleshooting.md
+++ b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/TestAndTroubleshooting.md
@@ -37,7 +37,7 @@ Distributed-shell + GPU + cgroup
 
 ## Issues:
 
-### Issue 1: Fail to start nodemanager after system reboot
+### Issue 1: Fail to start NodeManager after system reboot
 
 ```
 2018-09-20 18:54:39,785 ERROR org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor:
Failed to bootstrap configured resource subsystems!
@@ -62,7 +62,7 @@ chown :yarn -R /sys/fs/cgroup/cpu,cpuacct
 chmod g+rwx -R /sys/fs/cgroup/cpu,cpuacct
 ```
 
-If GPUs are used,the access to cgroup devices folder is neede as well
+If GPUs are used, access to cgroup devices folder is required as well.
 
 ```
 chown :yarn -R /sys/fs/cgroup/devices
@@ -140,7 +140,7 @@ $ chmod +x find-busy-mnt.sh
 $ kill -9 5007
 ```
 
-### Issue 5:Yarn failed to start containers
+### Issue 5:YARN fails to start containers
 
-if the number of GPUs required by applications is larger than the number of GPUs in the cluster,
there would be some containers can't be created.
+If the number of GPUs required by an application is greater than the number of GPUs in the
cluster, some container will not be created.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


Mime
View raw message