singa-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wan...@apache.org
Subject svn commit: r1723842 - in /incubator/singa/site/trunk/content: markdown/docs/architecture.md markdown/docs/distributed-training.md markdown/docs/hybrid.md site.xml
Date Sat, 09 Jan 2016 10:21:49 GMT
Author: wangsh
Date: Sat Jan  9 10:21:49 2016
New Revision: 1723842

URL: http://svn.apache.org/viewvc?rev=1723842&view=rev
Log:
add docs for hybrid partition

Added:
    incubator/singa/site/trunk/content/markdown/docs/hybrid.md
Modified:
    incubator/singa/site/trunk/content/markdown/docs/architecture.md
    incubator/singa/site/trunk/content/markdown/docs/distributed-training.md
    incubator/singa/site/trunk/content/site.xml

Modified: incubator/singa/site/trunk/content/markdown/docs/architecture.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/architecture.md?rev=1723842&r1=1723841&r2=1723842&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/architecture.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/architecture.md Sat Jan  9 10:21:49 2016
@@ -35,7 +35,7 @@ within a group:
   against all data partitioned to the group.
   * **Data parallelism**. Each worker computes all parameters
   against a subset of data.
-  * [**Hybrid parallelism**](). SINGA also supports hybrid parallelism.
+  * [**Hybrid parallelism**](hybrid.html). SINGA also supports hybrid parallelism.
 
 
 ## Implementation

Modified: incubator/singa/site/trunk/content/markdown/docs/distributed-training.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/distributed-training.md?rev=1723842&r1=1723841&r2=1723842&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/distributed-training.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/distributed-training.md Sat Jan  9 10:21:49
2016
@@ -2,14 +2,7 @@
 
 ---
 
-SINGA is designed for distributed training of large deep learning models with huge amount
of training data. It is intergrated with Mesos, so that distributed training can be started
as a Mesos framework. Currently, the Mesos cluster can be set up from SINGA containers, i.e.
we provide Docker images that bundles Mesos and SINGA together. Refer to the guide below for
instructions as how to start and use the cluster.
-
-* [Distributed training on Mesos](mesos.html)
-
-SINGA can run on top of distributed storage system to achieve scalability. The current version
of SINGA supports HDFS.
-
-* [Running SINGA on HDFS](hdfs.html)
-
+SINGA is designed for distributed training of large deep learning models with huge amount
of training data.
 We also provide high-level descriptions of design behind SINGA's distributed architecture.

 
 * [System Architecture](architecture.html)
@@ -17,3 +10,16 @@ We also provide high-level descriptions
 * [Training Frameworks](frameworks.html)
 
 * [System Communication](communication.html)
+
+SINGA supports different options for training a model in parallel, includeing data parallelism,
model parallelism and hybrid parallelism.
+
+* [Hybrid Parallelism](hybrid.html)
+
+SINGA is intergrated with Mesos, so that distributed training can be started as a Mesos framework.
Currently, the Mesos cluster can be set up from SINGA containers, i.e. we provide Docker images
that bundles Mesos and SINGA together. Refer to the guide below for instructions as how to
start and use the cluster.
+
+* [Distributed training on Mesos](mesos.html)
+
+SINGA can run on top of distributed storage system to achieve scalability. The current version
of SINGA supports HDFS.
+
+* [Running SINGA on HDFS](hdfs.html)
+

Added: incubator/singa/site/trunk/content/markdown/docs/hybrid.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/hybrid.md?rev=1723842&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/hybrid.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/hybrid.md Sat Jan  9 10:21:49 2016
@@ -0,0 +1,83 @@
+# Hybrid Parallelism
+
+---
+
+## User Guide
+
+SINGA supports different parallelism options for distributed training.
+Users just need to configure it in the job configuration.
+
+Both `NetProto` and `LayerProto` have a field `partition_dim` to control the parallelism
option:
+
+  * `partition_dim=0`: neuralnet/layer is partitioned on data dimension, i.e., each worker
processes a subset of data records.
+  * `partition_dim=1`: neuralnet/layer is partitioned on feature dimension, i.e., each worker
maintains a subset of feature parameters.
+
+`partition_dim` field in `NetProto` will be applied to all layers, unless a layer has its
own `partition_dim` field set.
+
+If we want data parallelism for the whole model, just leave `partition_dim` as default (which
is 0), or configure the job.conf like:
+
+```
+neuralnet {
+  partition_dim: 0
+  layer {
+    name: ... 
+    type: ...
+  }
+  ...
+}
+```
+
+With the hybrid parallelism, we can have layers either partitioned on data dimension or feature
dimension.
+For example, if we want a specific layer partitioned on feature dimension, just configure
like:
+
+```
+neuralnet {
+  partition_dim: 0
+  layer {
+    name: "layer1_partition_on_data_dimension"
+    type: ...
+  }
+  layer {
+    name: "layer2_partition_on_feature_dimension"
+    type: ...
+    partition_dim: 1
+  }
+  ...
+}
+```
+
+## Developer Guide
+
+To support hybrid parallelism, after singa read users' model and paration configuration,
a set of connection layers are automatically added between layers when needed:
+
+* `BridgeSrcLayer` & `BridgeDstLayer` are added when two connected layers are not in
the same machine. They are paired and are responsible for sending data/gradient to the other
side during each iteration.
+
+* `ConcateLayer` is added when there are multiple source layers. It combines their feature
blobs along a given dimension.
+
+* `SliceLayer` is added when there are mutliple dest layers, each of which only needs a subset(slice)
of this layers' feature blob.
+
+* `SplitLayer` is added when there are multiple dest layers, each of which needs the whole
feature blob.
+
+Following is the logic used in our code to add connection layers:
+
+```
+Add Slice, Concate, Split Layers for Hybrid Partition
+
+All cases are as follows:
+src_pdim | dst_pdim | connection_type | Action
+    0    |     0    |     OneToOne    | Direct Connection
+    1    |     1    |     OneToOne    | Direct Connection
+    0    |     0    |     OneToAll    | Direct Connection
+    1    |     0    |     OneToOne    | Slice -> Concate
+    0    |     1    |     OneToOne    | Slice -> Concate
+    1    |     0    |     OneToAll    | Slice -> Concate
+    0    |     1    |     OneToAll    | Split -> Concate
+    1    |     1    |     OneToAll    | Split -> Concate
+
+Logic:
+dst_pdim = 1 && OneToAll ?
+  (YES) Split -> Concate
+  (NO)  src_pdim = dst_pdim ?
+          (YES) Direct Connection
+          (NO)  Slice -> Concate
+```

Modified: incubator/singa/site/trunk/content/site.xml
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/site.xml?rev=1723842&r1=1723841&r2=1723842&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/site.xml (original)
+++ incubator/singa/site/trunk/content/site.xml Sat Jan  9 10:21:49 2016
@@ -79,10 +79,11 @@
           <item name="Updater" href="docs/updater.html"/>
         </item>
         <item name="Distributed Training" href="docs/distributed-training.html" collapse="true"
>
-	  <item name="Training on Mesos" href="docs/mesos.html"/>
           <item name="System Architecture" href="docs/architecture.html"/>
           <item name="Frameworks" href="docs/frameworks.html"/>
           <item name="Communication" href="docs/communication.html"/>
+          <item name="Hybrid Parallelism" href="docs/hybrid.html"/>
+	        <item name="Training on Mesos" href="docs/mesos.html"/>
           <item name="Using HDFS" href="docs/hdfs.html"/>
         </item>
         <item name="Data Preparation" href="docs/data.html"/>



Mime
View raw message