hama-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hama Wiki] Update of "SpMV" by Mikalai Parafeniuk
Date Sat, 18 Aug 2012 16:20:06 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The "SpMV" page has been changed by Mikalai Parafeniuk:
http://wiki.apache.org/hama/SpMV?action=diff&rev1=15&rev2=16

  
  === Algorithm description ===
  The generic algorithm will contain one superstep, because no communication is needed:
-  0. Matrix and vector distribution.
+  1. Matrix and vector distribution.
-  1. Custom partitioning.
+  2. Custom partitioning.
-  2. Local computation.
+  3. Local computation.
-  3. Output of result vector.
+  4. Output of result vector.
-  4. Constructing of dense vector.
+  5. Constructing of dense vector.
  In setup stage every peer reads input dense vector from file. After that, framework will
partition matrix rows by the algorithm provided in custom partitioner automatically. After
that local computation is performed. We gain some cells of result vector in bsp procedure,
and they are written to output file. Output file is reread to construct instance of dense
vector for further computation.
  
  === Implementation ===
@@ -55, +55 @@

  [0 0 0]  = 3 0
  [0 5 1]    3 2 1 5 2 1
  }}}
- Now let's show some example. Imagine that you need to multiply
+ ==== Usage with RandomMatrixGenerator ====
+ `RandomMatrixGenerator` as a `SpMV` works with sequence file format. So, to multiply random
matrix with random vector we will do the following: generate matrix and vector; convert matrix,
vector and result to text file; view matrix, vector and result. This sequence is described
by the following code snippet:
+ {{{
+ 1:    hadoop dfs -rmr $SPMV/*/*
+ 2:    hama jar $HAMA_EXAMPLES rmgenerator $SPMV/matrix-seq 6 6 0.4 4
+ 3:    hama jar $HAMA_EXAMPLES rmgenerator $SPMV/vector-seq 1 6 0.9 4
+ 4:    hama jar $HAMA_EXAMPLES spmv $SPMV/matrix-seq $SPMV/vector-seq $SPMV/result-seq 4
+ 5:    hadoop dfs -rmr $SPMV/result-seq/part
+ 6:    hama jar $HAMA_EXAMPLES matrixtotext $SPMV/matrix-seq $SPMV/matrix-txt
+ 7:    hama jar $HAMA_EXAMPLES matrixtotext $SPMV/vector-seq $SPMV/vector-txt
+ 8:    hama jar $HAMA_EXAMPLES matrixtotext $SPMV/result-seq $SPMV/result-txt
+ 9:    hadoop dfs -cat /user/hduser/spmv/matrix-txt/*
+    0	 6 3 5 0.24316243288531214 2 0.638622414091597 3 0.5480468710898891
+    3	 6 2 5 0.5054043538570098 2 0.03911646523753309
+    1	 6 3 4 0.5077528966368161 5 0.5780340816354201 3 0.4626752204959449
+    4	 6 2 1 0.6512355661856207 4 0.08804976645891671
+    2	 6 2 4 0.7200271909735554 1 0.3510851368183805
+    5	 6 2 2 0.5848717104309032 3 0.0889791409798859
+ 
+ 10:   hadoop dfs -cat /user/hduser/spmv/vector-txt/*
+    0	 6 6 0 0.3365077672167889 1 0.17498609722570935 2 0.32806410950648845 3 0.6016567879100464
4 0.786158850847722 5 0.6856872945972037
+ 11:   hadoop dfs -cat /user/hduser/spmv/result-txt/*
+    0	 6 6 0 0.7059786044267415 1 1.0738967463653346 2 0.6274907669206862 3 0.35938205240905363
4 0.18317827331814918 5 0.24541032101100438
+ }}}
+ We got the expected result. So, now we will explain the meaning of each line in code snippet
above.<<BR>>
+ Line 1: Clean up of directories related to SpMV tests.<<BR>>
+ Line 2-3: Generation of input matrix and vector. In this example we test 6x6 matrix and
1x6 vector multiplication<<BR>>
+ Line 4: SpMV algorithm.<<BR>>
+ Line 5: Deletion of part files from output directory at line 4. NOTE: `matrixtotext` will
fail if this step will not be performed, because `result-seq` will containg part folder and
`matrixtotext` don't know how to deal with it yet.<<BR>>
+ Line 6-8: Convertion of input matrix, input vector and result to text format.<<BR>>
+ Line 9-11: Showing the result. 
+ 
+ ==== Usage with arbitrary text files ====
+ SpMV works with `SequenceFile`, so we need to provide tools to convert input and output
of SpMV between sequence file format and text format. These tools are `matrixtoseq` and `matrixtotext`.
This programs are included in example driver, so they can be launched like any other example.
`matrixtoseq` converts matrix, represented in text file to sequence file format. Also this
program gives choice to choose target writable: `DenseVectorWritable` and `SparseVectorWritable`.
+ {{{
+ Usage: matrixtoseq <input matrix dir> <output matrix dir> <dense|sparse>
[number of tasks (default max)]
+ }}}
+ `matrixtotext` converts matrix from sequence file format to text file.
+ {{{
+ Usage: matrixtotext <input matrix dir> <output matrix dir> [number of tasks
(default max)]
+ }}}
+ Now let's show some example. To use SpMV in this mode you should provide text files in appropriate
format, as described above. Imagine that you need to multiply
  {{{
  [1 0 6 0]   [2]   [38] 
  [0 4 0 0] * [3] = [12] 
@@ -73, +114 @@

  {{{
  0 4 3 0 2 1 3 2 6
  }}}
- ==== Usage with RandomMatrixGenerator ====
- `RandomMatrixGenerator` as a `SpMV` works with sequence file format. So, to multiply random
matrix with random vector we will do the following: generate matrix and vector; convert matrix,
vector and result to text file; view matrix, vector and result. This sequence is described
by the following code snippet:
- {{{
- 0:    hadoop dfs -rmr $SPMV/*/*
- 1:    hama jar $HAMA_EXAMPLES rmgenerator $SPMV/matrix-seq 6 6 0.4 4
- 2:    hama jar $HAMA_EXAMPLES rmgenerator $SPMV/vector-seq 1 6 0.9 4
- 3:    hama jar $HAMA_EXAMPLES spmv $SPMV/matrix-seq $SPMV/vector-seq $SPMV/result-seq 4
- 4:    hadoop dfs -rmr $SPMV/result-seq/part
- 5:    hama jar $HAMA_EXAMPLES matrixtotext $SPMV/matrix-seq $SPMV/matrix-txt
- 6:    hama jar $HAMA_EXAMPLES matrixtotext $SPMV/vector-seq $SPMV/vector-txt
- 7:    hama jar $HAMA_EXAMPLES matrixtotext $SPMV/result-seq $SPMV/result-txt
- 8:    hadoop dfs -cat /user/hduser/spmv/matrix-txt/*
-    0	 6 3 5 0.24316243288531214 2 0.638622414091597 3 0.5480468710898891
-    3	 6 2 5 0.5054043538570098 2 0.03911646523753309
-    1	 6 3 4 0.5077528966368161 5 0.5780340816354201 3 0.4626752204959449
-    4	 6 2 1 0.6512355661856207 4 0.08804976645891671
-    2	 6 2 4 0.7200271909735554 1 0.3510851368183805
-    5	 6 2 2 0.5848717104309032 3 0.0889791409798859
- 
- 9:    hadoop dfs -cat /user/hduser/spmv/vector-txt/*
-    0	 6 6 0 0.3365077672167889 1 0.17498609722570935 2 0.32806410950648845 3 0.6016567879100464
4 0.786158850847722 5 0.6856872945972037
- 10:   hadoop dfs -cat /user/hduser/spmv/result-txt/*
-    0	 6 6 0 0.7059786044267415 1 1.0738967463653346 2 0.6274907669206862 3 0.35938205240905363
4 0.18317827331814918 5 0.24541032101100438
- }}}
- We got the expected result. So, now we will explain the meaning of each line in code snippet
above.<<BR>>
- Line 0: Clean up of directories related to SpMV tests.<<BR>>
- Line 1-2: Generation of input matrix and vector. In this example we test 6x6 matrix and
1x6 vector multiplication<<BR>>
- Line 3: SpMV algorithm.<<BR>>
- Line 4: Deletion of part files from output directory at line 4. NOTE: `matrixtotext` will
fail if this step will not be performed, because `result-seq` will containg part folder and
`matrixtotext` don't know how to deal with it yet.<<BR>>
- Line 5-7: Convertion of input matrix, input vector and result to text format.<<BR>>
- Line 8-10: Showing the result. 
- 
- ==== Usage with arbitrary text files ====
- SpMV works with `SequenceFile`, so we need to provide tools to convert input and output
of SpMV between sequence file format and text format. These tools are `matrixtoseq` and `matrixtotext`.
This programs are included in example driver, so they can be launched like any other example.
`matrixtoseq` converts matrix, represented in text file to sequence file format. Also this
program gives choice to choose target writable: `DenseVectorWritable` and `SparseVectorWritable`.
- {{{
- Usage: matrixtoseq <input matrix dir> <output matrix dir> <dense|sparse>
[number of tasks (default max)]
- }}}
- `matrixtotext` converts matrix from sequence file format to text file.
- {{{
- Usage: matrixtotext <input matrix dir> <output matrix dir> [number of tasks
(default max)]
- }}}
- To use SpMV in this mode you should provide text files in appropriate format, as described
above. 
  After that you should copy these files to HDFS. If you don't feel comfortable with HDFS
please see [[http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html|this tutorial]].
After you have copied input matrix into `matrix-txt` and input vector into `vector-txt`, we
are ready to start. The following code snippet shows, how you can multiply matrices in this
mode. Explanations will be given below.
  {{{
  1: hama jar $HAMA_EXAMPLES matrixtoseq $SPMV/matrix-txt $SPMV/matrix-seq sparse 4
@@ -132, +131 @@

  Line 5: Convertion of result vector to text format.<<BR>>
  Line 6: Output of result vector. You can see that we gained an expected vector.<<BR>>
  
- 
  === Possible improvements ===
-  1. Bug fixing. My main aim now - provide stable work of SpMV.
-  2. Significant improvement in total time of algorithm can be achieved by creating custom
partitioner class. It will give us load balancing and therefore better efficiency. This is
the main possibility for optimization, because we decided, that using of row-wise matrix access
i acceptable. Maybe it can be achieved by reordering of input or by customizing partitioning
algorithm of framework.
+  1. Significant improvement in total time of algorithm can be achieved by creating custom
partitioner class. It will give us load balancing and therefore better efficiency. This is
the main possibility for optimization, because we decided, that using of row-wise matrix access
i acceptable. Maybe it can be achieved by reordering of input or by customizing partitioning
algorithm of framework.
  
  === Literature ===
   1. Rob H. Bisseling - Parallel Scientific computation. (chapter 4).

Mime
View raw message