singa-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dinh...@apache.org
Subject svn commit: r1709857 - in /incubator/singa/site/trunk/content: markdown/docs/distributed-training.md markdown/docs/docker.md markdown/docs/installation.md markdown/docs/installation_source.md markdown/docs/mesos.md site.xml
Date Wed, 21 Oct 2015 14:52:08 GMT
Author: dinhtta
Date: Wed Oct 21 14:52:08 2015
New Revision: 1709857

URL: http://svn.apache.org/viewvc?rev=1709857&view=rev
Log:
Added documentation for Docker and Mesos

Added:
    incubator/singa/site/trunk/content/markdown/docs/docker.md
    incubator/singa/site/trunk/content/markdown/docs/installation_source.md
    incubator/singa/site/trunk/content/markdown/docs/mesos.md
Modified:
    incubator/singa/site/trunk/content/markdown/docs/distributed-training.md
    incubator/singa/site/trunk/content/markdown/docs/installation.md
    incubator/singa/site/trunk/content/site.xml

Modified: incubator/singa/site/trunk/content/markdown/docs/distributed-training.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/distributed-training.md?rev=1709857&r1=1709856&r2=1709857&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/distributed-training.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/distributed-training.md Wed Oct 21 14:52:08
2015
@@ -2,13 +2,15 @@
 
 ---
 
-SINGA is designed for distributed training of large deep learning models with
-huge amount of training data.
+SINGA is designed for distributed training of large deep learning models with huge amount
of training data. It is intergrated with Mesos, so that distributed training can be started
as a Mesos framework. Currently, the Mesos cluster can be set up from SINGA containers, i.e.
we provide Docker images that bundles Mesos and SINGA together. Refer to the guide below for
instructions as how to start and use the cluster.
 
-Here we introduce distrbuted SINGA in following aspects:
+* [Distributed training on Mesos](mesos.html)
+
+We also provide high-level descriptions of design behind SINGA's distributed architecture.

 
 * [System Architecture](architecture.html)
 
 * [Training Frameworks](frameworks.html)
 
 * [System Communication](communication.html)
+

Added: incubator/singa/site/trunk/content/markdown/docs/docker.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/docker.md?rev=1709857&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/docker.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/docker.md Wed Oct 21 14:52:08 2015
@@ -0,0 +1,192 @@
+# Building SINGA Docker container 
+ 
+This guide explains how to set up a development environment for SINGA using Docker. It requires
only Docker to be installed. The resulting image contains the complete working environment
for SINGA. The image can then be used to set up cluster environment over one or multiple physical
nodes.  
+
+1. [Build SINGA base](#build_base)
+2. [Build SINGA with Mesos and Hadoop](#build_mesos)
+3. [Pre-built images](#pre_built)
+4. [Launch and stop SINGA (stand alone mode)](#launch_stand_alone)
+5. [Launch pseudo-distributed SINGA on one node](#launch_pseudo)
+6. [Launch fully distributed SINGA on multiple nodes](#launch_distributed)
+
+---
+
+<a name="build_base"></a>
+#### Build SINGA base image
+ 
+````
+$ cd tool/docker/singa
+$ sudo docker build -t singa/base . 
+$ sudo docker images
+REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL
SIZE
+singa/base             latest              XXXX                XXX                 2.01 GB
+````
+
+The result is the image containing a built version of SINGA. 
+
+   ![singa/base](http://www.comp.nus.edu.sg/~dinhtta/files/images_base.png)
+
+   *Figure 1. singa/base Docker image, containing library dependencies and SINGA built from
source.*
+
+---
+
+<a name="build_mesos"></a>
+#### Build SINGA with Mesos and Hadoop
+````
+$ cd tool/docker/mesos
+$ sudo docker build -t singa/mesos .
+$ sudo docker images
+REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL
SIZE
+singa/mesos             latest              XXXX                XXX                 4.935
GB
+````
+   ![singa/mesos](http://www.comp.nus.edu.sg/~dinhtta/files/images_mesos.png#1)
+   
+   *Figure 2. singa/mesos Docker image, containing Hadoop and Mesos built on
+top of SINGA. The default namenode address for Hadoop is `node0:9000`*
+
+**Notes** A common failure observed during the build process is caused by network failure
occuring when downloading dependencies. Simply re-run the build command. 
+
+---
+
+<a name="pre_built"></a>
+#### Pre-built images on epiC cluster
+For users with access to the `epiC` cluster, there are pre-built and loaded Docker images
at the following nodes:
+
+      ciidaa-c18
+      ciidaa-c19
+
+The available images at those nodes are:
+
+````
+REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL
SIZE
+singa/base             latest              XXXX                XXX                 2.01 GB
+singa/mesos            latest              XXXX                XXX                 4.935
GB
+weaveworks/weaveexec   1.1.1               XXXX                11 days ago         57.8 MB
+weaveworks/weave       1.1.1               XXXX                11 days ago         17.56
MB
+````
+
+---
+
+<a name="launch_stand_alone"></a>
+#### Launch and stop SINGA in stand-alone mode
+To launch a test environment for a single-node SINGA training, simply start a container from
`singa/base` image. The following starts a container called
+`XYZ`, then launches a shell in the container: 
+
+````
+$ sudo docker run -dt --name XYZ singa/base /usr/bin/supervisord
+$ sudo docker exec -it XYZ /bin/bash
+````
+
+![Nothing](http://www.comp.nus.edu.sg/~dinhtta/files/images_standalone.png#1)
+
+   *Figure 3. Launch SINGA in stand-alone mode: single node training*
+
+Inside the launched container, the SINGA source directory can be found at `/root/incubator-singa`.

+
+**Stopping the container**
+
+````
+$ sudo docker stop XYZ
+$ sudo docker rm ZYZ
+````
+
+---
+
+<a name="launch_pseudo"></a>
+#### Launch SINGA on pseudo-distributed mode (single node)
+To simulate a distributed environment on a single node, one can repeat the
+previous step multiple times, each time giving a different name to the
+container.  Network connections between these containers are already supported,
+thus SINGA instances/nodes in these container can readily communicate with each
+other. 
+
+The previous approach requires the user to start SINGA instances individually
+at each container. Although there's a bash script for that, we provide a better
+way. In particular, multiple containers can be started from `singa/mesos` image
+which already bundles Mesos and Hadoop with SINGA. Using Mesos makes it easy to
+launch, stop and monitor the distributed execution from a single container.
+Figure 4 shows `N+1` containers running concurrently at the local host. 
+
+````
+$ sudo docker run -dt --name node0 singa/mesos /usr/bin/supervisord
+$ sudo docker run -dt --name node1 singa/mesos /usr/bin/supervisord
+...
+````
+
+![Nothing](http://www.comp.nus.edu.sg/~dinhtta/files/images_pseudo.png#1)
+   
+*Figure 4. Launch SINGA in pseudo-distributed mode : multiple SINGA nodes over one single
machine*
+
+**Starting SINGA distributed training**
+
+Refer to the [Mesos
+guide](mesos.html)
+for details of how to start training with multiple SINGA instances. 
+
+**Important:** the container that assumes the role of Hadoop's namenode (and often Mesos's
and Zookeeper's mater node as well) **must** be named `node0`. Otherwise, the user must log
in to individual containers and change the Hadoop configuration separately. 
+ 
+---
+
+<a name="launch_distributed"></a>
+#### Launch SINGA on fully distributed mode (multiple nodes)
+The previous section has explained how to start a distributed environment on a
+single node. But running many containers on one node does not scale. When there
+are multiple physical hosts available, it is better to distribute the
+containers over them. 
+
+The only extra requirement for the fully distributed mode, as compared with the
+pseudo distributed mode, is that the containers from different hosts are able
+to transparently communicate with each other. In the pseudo distributed mode,
+the local docker engine takes care of such communication. Here, we rely on
+[Weave](http://weave.works/guides/weave-docker-ubuntu-simple.html) to make the
+communication transparent. The resulting architecture is shown below.  
+
+![Nothing](http://www.comp.nus.edu.sg/~dinhtta/files/images_full.png#1)
+   
+*Figure 5. Launch SINGA in fully distributed mode: multiple SINGA nodes over multiple machines*
+
+**Install Weave at all hosts**
+
+```
+$ curl -L git.io/weave -o /usr/local/bin/weave
+$ chmod a+x /usr/local/bin/weave
+```
+
+**Starting Weave**
+
+Suppose `node0` will be launched at host with IP `111.222.111.222`.
+
++ At host `111.222.111.222`:
+
+          $ weave launch
+          $ eval "$(weave env)"  //if there's error, do `sudo -s` and try again
+
++ At other hosts:
+
+          $ weave launch 111.222.111.222
+          $ eval "$(weave env)" //if there's error, do `sudo -s` and try again
+
+**Starting containers**
+
+The user logs in to each host and starts the container (same as in [pseudo-distributed](#launch_pseudo)
mode). Note that container acting as the head node of the cluster must be named `node0` (and
be running at the host with IP `111.222.111.222`, for example). 
+
+**_Important_:** when there are other containers sharing the same host as `node0`, say `node1`
and `node2` for example,
+there're additional changes to be made to `node1` and `node2`. Particularly, log in to each
container and edit
+`/etc/hosts` file:
+
+````
+# modified by weave
+...
+X.Y.Z	node0 node0.bridge  //<- REMOVE this line
+..
+````
+This is to ensure that name resolutions (of `node0`'s address) from `node1` and `node2` are
correct. By default,
+containers of the same host resolves each other's addresses via the Docker bridge. Instead,
we want they to use
+addressed given by Weave.  
+
+
+**Starting SINGA distributed training**
+
+Refer to the [Mesos guide](mesos.html)
+for details of how to start training with multiple SINGA instances. 
+

Modified: incubator/singa/site/trunk/content/markdown/docs/installation.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/installation.md?rev=1709857&r1=1709856&r2=1709857&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/installation.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/installation.md Wed Oct 21 14:52:08 2015
@@ -1,249 +1,9 @@
-# Installation
+# Installation 
 
 ---
 
-## Dependencies
+Currently, there are two ways to install SINGA: build directly from source, and build a Docker
image. 
 
-SINGA is developed and tested on Linux platforms.
+* [Build SINGA directly from source](installation_source.html) 
+* [Build SINGA as a Docker container](docker.html)
 
-The following dependent libraries are required:
-
-  * glog version 0.3.3
-
-  * google-protobuf version 2.6.0
-
-  * openblas version >= 0.2.10
-
-  * zeromq version >= 3.2
-
-  * czmq version >= 3
-
-  * zookeeper version 3.4.6
-
-
-Optional dependencies include:
-
-  * lmdb version 0.9.10
-
-
-You can install all dependencies into $PREFIX folder by
-
-    # make sure you are in the thirdparty folder
-    cd thirdparty
-    ./install.sh all $PREFIX
-
-If $PREFIX is not a system path (e.g., /usr/local/), please export the following
-variables to continue the building instructions,
-
-    export LD_LIBRARY_PATH=$PREFIX/lib:$LD_LIBRARY_PATH
-    export CPLUS_INCLUDE_PATH=$PREFIX/include:$CPLUS_INCLUDE_PATH
-    export LIBRARY_PATH=$PREFIX/lib:$LIBRARY_PATH
-    export PATH=$PREFIX/bin:$PATH
-
-More details on using this script is given below.
-
-## Building SINGA from source
-
-SINGA is built using GNU autotools. GCC (version >= 4.8) is required.
-There are two ways to build SINGA,
-
-  * If you want to use the latest code, please clone it from
-  [Github](https://github.com/apache/incubator-singa.git) and execute
-  the following commands,
-
-        $ git clone git@github.com:apache/incubator-singa.git
-        $ cd incubator-singa
-        $ ./autogen.sh
-        $ ./configure
-        $ make
-
-  Note: It is an oversight that we forgot to delete the singa repo under [nusinga](https://github.com/orgs/nusinga)
-  account after we became Apache Incubator project -- the source
-  in that repo was not up to date, and we apologize for any inconvenience.
-
-  * If you download a release package, please follow the instructions below,
-
-        $ tar xvf singa-xxx
-        $ cd singa-xxx
-        $ ./configure
-        $ make
-
-    Some features of SINGA depend on external libraries. These features can be
-    compiled with `--enable-<feature>`.
-    For example, to build SINGA with lmdb support, you can run:
-
-        $ ./configure --enable-lmdb
-
-<!---
-Zhongle: please update the code to use the follow command
-
-    $ make test
-
-After compilation, you will find the binary file singatest. Just run it!
-More details about configure script can be found by running:
-
-		$ ./configure -h
--->
-
-After compiling SINGA successfully, the *libsinga.so* and the executable file
-*singa* will be generated into *.libs/* folder.
-
-If some dependent libraries are missing (or not detected), you can use the
-following script to download and install them:
-
-<!---
-to be updated after zhongle changes the code to use
-
-    ./install.sh libname \-\-prefix=
-
--->
-    # must goto thirdparty folder
-    $ cd thirdparty
-    $ ./install.sh LIB_NAME PREFIX
-
-If you do not specify the installation path, the library will be installed in
-the default folder specified by the software itself.  For example, if you want
-to install `zeromq` library in the default system folder, run it as
-
-    $ ./install.sh zeromq
-
-Or, if you want to install it into another folder,
-
-    $ ./install.sh zeromq PREFIX
-
-You can also install all dependencies in */usr/local* directory:
-
-    $ ./install.sh all /usr/local
-
-Here is a table showing the first arguments:
-
-    LIB_NAME  LIBRARIE
-    czmq*                 czmq lib
-    glog                  glog lib
-    lmdb                  lmdb lib
-    OpenBLAS              OpenBLAS lib
-    protobuf              Google protobuf
-    zeromq                zeromq lib
-    zookeeper             Apache zookeeper
-
-*: Since `czmq` depends on `zeromq`, the script offers you one more argument to
-indicate `zeromq` location.
-The installation commands of `czmq` is:
-
-<!---
-to be updated to
-
-    $./install.sh czmq  \-\-prefix=/usr/local \-\-zeromq=/usr/local/zeromq
--->
-
-    $./install.sh czmq  /usr/local -f=/usr/local/zeromq
-
-After the execution, `czmq` will be installed in */usr/local*. The last path
-specifies the path to zeromq.
-
-### FAQ
-* Q1:I get error `./configure --> cannot find blas_segmm() function` even I
-have installed OpenBLAS.
-
-  A1: This means the compiler cannot find the `OpenBLAS` library. If you installed
-  it to $PREFIX (e.g., /opt/OpenBLAS), then you need to export it as
-
-      $ export LIBRARY_PATH=$PREFIX/lib:$LIBRARY_PATH
-      # e.g.,
-      $ export LIBRARY_PATH=/opt/OpenBLAS/lib:$LIBRARY_PATH
-
-
-* Q2: I get error `cblas.h no such file or directory exists`.
-
-  Q2: You need to include the folder of the cblas.h into CPLUS_INCLUDE_PATH,
-  e.g.,
-
-      $ export CPLUS_INCLUDE_PATH=$PREFIX/include:$CPLUS_INCLUDE_PATH
-      # e.g.,
-      $ export CPLUS_INCLUDE_PATH=/opt/OpenBLAS/include:$CPLUS_INCLUDE_PATH
-      # then reconfigure and make SINGA
-      $ ./configure
-      $ make
-
-
-* Q3:While compiling SINGA, I get error `SSE2 instruction set not enabled`
-
-  A3:You can try following command:
-
-      $ make CFLAGS='-msse2' CXXFLAGS='-msse2'
-
-
-* Q4:I get `ImportError: cannot import name enum_type_wrapper` from
-google.protobuf.internal when I try to import .py files.
-
-  A4:After install google protobuf by `make install`, we should install python
-  runtime libraries. Go to protobuf source directory, run:
-
-      $ cd /PROTOBUF/SOURCE/FOLDER
-      $ cd python
-      $ python setup.py build
-      $ python setup.py install
-
-  You may need `sudo` when you try to install python runtime libraries in
-  the system folder.
-
-
-* Q5: I get a linking error caused by gflags.
-
-  A5: SINGA does not depend on gflags. But you may have installed the glog with
-  gflags. In that case you can reinstall glog using *thirdparty/install.sh* into
-  a another folder and export the LDFLAGS and CPPFLAGS to include that folder.
-
-
-* Q6: While compiling SINGA and installing `glog` on mac OS X, I get fatal error
-`'ext/slist' file not found`
-
-  A6:Please install `glog` individually and try :
-
-      $ make CFLAGS='-stdlib=libstdc++' CXXFLAGS='stdlib=libstdc++'
-
-* Q7: When I start a training job, it reports error related with "ZOO_ERROR...zk retcode=-4...".
-
-  A7: This is because the zookeeper is not started. Please start the zookeeper service
-
-      $ ./bin/zk-service start
-
-  If the error still exists, probably that you do not have java. You can simple
-  check it by
-
-      $ java --version
-
-* Q8: When I build OpenBLAS from source, I am told that I need a fortran compiler.
-
-  A8: You can compile OpenBLAS by
-
-      $ make ONLY_CBLAS=1
-
-  or install it using
-
-	    $ sudo apt-get install openblas-dev
-
-  or
-
-	    $ sudo yum install openblas-devel
-
-  It is worth noting that you need root access to run the last two commands.
-  Remember to set the environment variables to include the header and library
-  paths of OpenBLAS after installation (please refer to the Dependencies section).
-
-* Q9: When I build protocol buffer, it reports that GLIBC++_3.4.20 not found in /usr/lib64/libstdc++.so.6.
-
-  A9: This means the linker found libstdc++.so.6 but that library
-  belongs to an older version of GCC than was used to compile and link the
-  program. The program depends on code defined in
-  the newer libstdc++ that belongs to the newer version of GCC, so the linker
-  must be told how to find the newer libstdc++ shared library.
-  The simplest way to fix this is to find the correct libstdc++ and export it to
-  LD_LIBRARY_PATH. For example, if GLIBC++_3.4.20 is listed in the output of the
-  following command,
-
-      $ strings /usr/local/lib64/libstdc++.so.6|grep GLIBC++
-
-  then you just set your environment variable as
-
-      $ export LD_LIBRARY_PATH=/usr/local/lib64:$LD_LIBRARY_PATH

Added: incubator/singa/site/trunk/content/markdown/docs/installation_source.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/installation_source.md?rev=1709857&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/installation_source.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/installation_source.md Wed Oct 21 14:52:08
2015
@@ -0,0 +1,249 @@
+# Building SINGA from source
+
+---
+
+## Dependencies
+
+SINGA is developed and tested on Linux platforms.
+
+The following dependent libraries are required:
+
+  * glog version 0.3.3
+
+  * google-protobuf version 2.6.0
+
+  * openblas version >= 0.2.10
+
+  * zeromq version >= 3.2
+
+  * czmq version >= 3
+
+  * zookeeper version 3.4.6
+
+
+Optional dependencies include:
+
+  * lmdb version 0.9.10
+
+
+You can install all dependencies into $PREFIX folder by
+
+    # make sure you are in the thirdparty folder
+    cd thirdparty
+    ./install.sh all $PREFIX
+
+If $PREFIX is not a system path (e.g., /usr/local/), please export the following
+variables to continue the building instructions,
+
+    export LD_LIBRARY_PATH=$PREFIX/lib:$LD_LIBRARY_PATH
+    export CPLUS_INCLUDE_PATH=$PREFIX/include:$CPLUS_INCLUDE_PATH
+    export LIBRARY_PATH=$PREFIX/lib:$LIBRARY_PATH
+    export PATH=$PREFIX/bin:$PATH
+
+More details on using this script is given below.
+
+## Building SINGA from source
+
+SINGA is built using GNU autotools. GCC (version >= 4.8) is required.
+There are two ways to build SINGA,
+
+  * If you want to use the latest code, please clone it from
+  [Github](https://github.com/apache/incubator-singa.git) and execute
+  the following commands,
+
+        $ git clone git@github.com:apache/incubator-singa.git
+        $ cd incubator-singa
+        $ ./autogen.sh
+        $ ./configure
+        $ make
+
+  Note: It is an oversight that we forgot to delete the singa repo under [nusinga](https://github.com/orgs/nusinga)
+  account after we became Apache Incubator project -- the source
+  in that repo was not up to date, and we apologize for any inconvenience.
+
+  * If you download a release package, please follow the instructions below,
+
+        $ tar xvf singa-xxx
+        $ cd singa-xxx
+        $ ./configure
+        $ make
+
+    Some features of SINGA depend on external libraries. These features can be
+    compiled with `--enable-<feature>`.
+    For example, to build SINGA with lmdb support, you can run:
+
+        $ ./configure --enable-lmdb
+
+<!---
+Zhongle: please update the code to use the follow command
+
+    $ make test
+
+After compilation, you will find the binary file singatest. Just run it!
+More details about configure script can be found by running:
+
+		$ ./configure -h
+-->
+
+After compiling SINGA successfully, the *libsinga.so* and the executable file
+*singa* will be generated into *.libs/* folder.
+
+If some dependent libraries are missing (or not detected), you can use the
+following script to download and install them:
+
+<!---
+to be updated after zhongle changes the code to use
+
+    ./install.sh libname \-\-prefix=
+
+-->
+    # must goto thirdparty folder
+    $ cd thirdparty
+    $ ./install.sh LIB_NAME PREFIX
+
+If you do not specify the installation path, the library will be installed in
+the default folder specified by the software itself.  For example, if you want
+to install `zeromq` library in the default system folder, run it as
+
+    $ ./install.sh zeromq
+
+Or, if you want to install it into another folder,
+
+    $ ./install.sh zeromq PREFIX
+
+You can also install all dependencies in */usr/local* directory:
+
+    $ ./install.sh all /usr/local
+
+Here is a table showing the first arguments:
+
+    LIB_NAME  LIBRARIE
+    czmq*                 czmq lib
+    glog                  glog lib
+    lmdb                  lmdb lib
+    OpenBLAS              OpenBLAS lib
+    protobuf              Google protobuf
+    zeromq                zeromq lib
+    zookeeper             Apache zookeeper
+
+*: Since `czmq` depends on `zeromq`, the script offers you one more argument to
+indicate `zeromq` location.
+The installation commands of `czmq` is:
+
+<!---
+to be updated to
+
+    $./install.sh czmq  \-\-prefix=/usr/local \-\-zeromq=/usr/local/zeromq
+-->
+
+    $./install.sh czmq  /usr/local -f=/usr/local/zeromq
+
+After the execution, `czmq` will be installed in */usr/local*. The last path
+specifies the path to zeromq.
+
+### FAQ
+* Q1:I get error `./configure --> cannot find blas_segmm() function` even I
+have installed OpenBLAS.
+
+  A1: This means the compiler cannot find the `OpenBLAS` library. If you installed
+  it to $PREFIX (e.g., /opt/OpenBLAS), then you need to export it as
+
+      $ export LIBRARY_PATH=$PREFIX/lib:$LIBRARY_PATH
+      # e.g.,
+      $ export LIBRARY_PATH=/opt/OpenBLAS/lib:$LIBRARY_PATH
+
+
+* Q2: I get error `cblas.h no such file or directory exists`.
+
+  Q2: You need to include the folder of the cblas.h into CPLUS_INCLUDE_PATH,
+  e.g.,
+
+      $ export CPLUS_INCLUDE_PATH=$PREFIX/include:$CPLUS_INCLUDE_PATH
+      # e.g.,
+      $ export CPLUS_INCLUDE_PATH=/opt/OpenBLAS/include:$CPLUS_INCLUDE_PATH
+      # then reconfigure and make SINGA
+      $ ./configure
+      $ make
+
+
+* Q3:While compiling SINGA, I get error `SSE2 instruction set not enabled`
+
+  A3:You can try following command:
+
+      $ make CFLAGS='-msse2' CXXFLAGS='-msse2'
+
+
+* Q4:I get `ImportError: cannot import name enum_type_wrapper` from
+google.protobuf.internal when I try to import .py files.
+
+  A4:After install google protobuf by `make install`, we should install python
+  runtime libraries. Go to protobuf source directory, run:
+
+      $ cd /PROTOBUF/SOURCE/FOLDER
+      $ cd python
+      $ python setup.py build
+      $ python setup.py install
+
+  You may need `sudo` when you try to install python runtime libraries in
+  the system folder.
+
+
+* Q5: I get a linking error caused by gflags.
+
+  A5: SINGA does not depend on gflags. But you may have installed the glog with
+  gflags. In that case you can reinstall glog using *thirdparty/install.sh* into
+  a another folder and export the LDFLAGS and CPPFLAGS to include that folder.
+
+
+* Q6: While compiling SINGA and installing `glog` on mac OS X, I get fatal error
+`'ext/slist' file not found`
+
+  A6:Please install `glog` individually and try :
+
+      $ make CFLAGS='-stdlib=libstdc++' CXXFLAGS='stdlib=libstdc++'
+
+* Q7: When I start a training job, it reports error related with "ZOO_ERROR...zk retcode=-4...".
+
+  A7: This is because the zookeeper is not started. Please start the zookeeper service
+
+      $ ./bin/zk-service start
+
+  If the error still exists, probably that you do not have java. You can simple
+  check it by
+
+      $ java --version
+
+* Q8: When I build OpenBLAS from source, I am told that I need a fortran compiler.
+
+  A8: You can compile OpenBLAS by
+
+      $ make ONLY_CBLAS=1
+
+  or install it using
+
+	    $ sudo apt-get install openblas-dev
+
+  or
+
+	    $ sudo yum install openblas-devel
+
+  It is worth noting that you need root access to run the last two commands.
+  Remember to set the environment variables to include the header and library
+  paths of OpenBLAS after installation (please refer to the Dependencies section).
+
+* Q9: When I build protocol buffer, it reports that GLIBC++_3.4.20 not found in /usr/lib64/libstdc++.so.6.
+
+  A9: This means the linker found libstdc++.so.6 but that library
+  belongs to an older version of GCC than was used to compile and link the
+  program. The program depends on code defined in
+  the newer libstdc++ that belongs to the newer version of GCC, so the linker
+  must be told how to find the newer libstdc++ shared library.
+  The simplest way to fix this is to find the correct libstdc++ and export it to
+  LD_LIBRARY_PATH. For example, if GLIBC++_3.4.20 is listed in the output of the
+  following command,
+
+      $ strings /usr/local/lib64/libstdc++.so.6|grep GLIBC++
+
+  then you just set your environment variable as
+
+      $ export LD_LIBRARY_PATH=/usr/local/lib64:$LD_LIBRARY_PATH

Added: incubator/singa/site/trunk/content/markdown/docs/mesos.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/mesos.md?rev=1709857&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/mesos.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/mesos.md Wed Oct 21 14:52:08 2015
@@ -0,0 +1,84 @@
+# Distributed Training on Mesos
+
+--- 
+
+This guide explains how to start SINGA distributed training on a Mesos cluster. It assumes
that both Mesos and HDFS are already running, and every node has SINGA installed. 
+We assume the architecture depicted below, in which a cluster nodes are Docker container.
Refer to [Docker guide](docker.html) for details of how to start individual nodes and set
up network connection between them (make sure [weave](http://weave.works/guides/weave-docker-ubuntu-simple.html)
is running at each node, and the cluster's headnode is running in container `node0`)  
+
+![Nothing](http://www.comp.nus.edu.sg/~dinhtta/files/singa_mesos.png)
+
+## Start HDFS and Mesos
+Go inside each container, using:
+````
+docker exec -it nodeX /bin/bash
+````
+and configure it as follows:
+
+* On container `node0`
+
+        hadoop namenode -format
+        hadoop-daemon.sh start namenode
+        /opt/mesos-0.22.0/build/bin/mesos-master.sh --work_dir=/opt --log_dir=/opt --quiet
> /dev/null &
+        zk-service.sh start
+
+* On container `node1, node2, ...`
+
+        hadoop-daemon.sh start datanode
+        /opt/mesos-0.22.0/build/bin/mesos-slave.sh --master=node0:5050 --log_dir=/opt --quiet
> /dev/null &
+
+To check if the setup has been successful, check that HDFS namenode has registered `N` datanodes,
via:
+
+````
+hadoop dfsadmin -report
+```` 
+
+#### Mesos logs 
+Mesos logs are stored at `/opt/lt-mesos-master.INFO` on `node0` and `/opt/lt-mesos-slave.INFO`
at other nodes. 
+
+---
+
+## Starting SINGA training on Mesos
+Assumed that Mesos and HDFS are already started, SINGA job can be launched at **any** container.

+
+#### Launching job
+
+1. Log in to any container, then
+        cd incubator-singa/tool/mesos
+<a name="job_start"></a>
+2. Check that configuration files are correct:
+    + `scheduler.conf` contains information about the master nodes
+    + `singa.conf` contains information about Zookeeper node0
+    + Job configuration file `job.conf` **contains full path to the examples directories
(NO RELATIVE PATH!).**
+3. Start the job:
+    + If starting for the first time:
+
+	          ./scheduler <job config file> -scheduler_conf <scheduler config file>
-singa_conf <SINGA config file> 
+    + If not the first time:
+
+	          ./scheduler <job config file>
+
+**Notes.** Each running job is given a `frameworkID`. Look for the log message of the form:
+
+             Framework registered with XXX-XXX-XXX-XXX-XXX-XXX
+
+#### Monitoring and Debugging
+
+Each Mesos job is given a `frameworkID` and a *sandbox* directory is created for each job.
+The directory is in the specified `work_dir` (or `/tmp/mesos`) by default. For example, the
error
+during SINGA execution can be found at:
+
+            /tmp/mesos/slaves/xxxxx-Sx/frameworks/xxxxx/executors/SINGA_x/runs/latest/stderr
+
+Other artifacts, like files downloaded from HDFS (`job.conf`) and `stdout` can be found in
the same
+directory. 
+
+#### Stopping
+
+There are two way to kill the running job:
+
+1. If the scheduler is running in the foreground, simply kill it (using `Ctrl-C`, for example).
+
+2. If the scheduler is running in the background, kill it using Mesos's REST API:
+
+          curl -d "frameworkId=XXX-XXX-XXX-XXX-XXX-XXX" -X POST http://<master>/master/shutdown
+

Modified: incubator/singa/site/trunk/content/site.xml
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/site.xml?rev=1709857&r1=1709856&r2=1709857&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/site.xml (original)
+++ incubator/singa/site/trunk/content/site.xml Wed Oct 21 14:52:08 2015
@@ -66,7 +66,10 @@
 
     <menu name="Documentaion">
       <item name = "Latest" href="docs/index.html" collapse="true" >
-        <item name="Installation" href="docs/installation.html"/>
+        <item name="Installation" href="docs/installation.html" collapse="true">
+	  <item name="From source" href="docs/installation_source.html"/>
+	  <item name="Using Docker" href="docs/docker.html"/>
+	</item>
         <item name="Programmer Guide" href="docs/programmer-guide.html" collapse="true">
           <item name ="Model Configuration" href="docs/model-config.html"/>
           <item name="Neural Network" href="docs/neural-net.html"/>
@@ -76,6 +79,7 @@
           <item name="Updater" href="docs/updater.html"/>
         </item>
         <item name="Distributed Training" href="docs/distributed-training.html" collapse="true"
>
+	  <item name="Training on Mesos" href="docs/mesos.html"/>
           <item name="System Architecture" href="docs/architecture.html"/>
           <item name="Frameworks" href="docs/frameworks.html"/>
           <item name="Communication" href="docs/communication.html"/>



Mime
View raw message