hama-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hama Wiki] Update of "SpMV" by Mikalai Parafeniuk
Date Sat, 18 Aug 2012 16:12:14 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The "SpMV" page has been changed by Mikalai Parafeniuk:
http://wiki.apache.org/hama/SpMV?action=diff&rev1=14&rev2=15

  In setup stage every peer reads input dense vector from file. After that, framework will
partition matrix rows by the algorithm provided in custom partitioner automatically. After
that local computation is performed. We gain some cells of result vector in bsp procedure,
and they are written to output file. Output file is reread to construct instance of dense
vector for further computation.
  
  === Implementation ===
+ ==== How to get ====
- Implementation can be found in my GitHub repository [[https://github.com/ParafeniukMikalaj/spmv]]
and patch can be found in [[https://issues.apache.org/jira/browse/HAMA-524|Apache JIRA]] as
soon as JIRA will become available. GitHub repository contains only classes related to SpMV.
Before you start with SpMV make sure that you have followed [[http://wiki.apache.org/hama/GettingStarted|this]]
guide and set up environment variables and so on. I considered two possible use cases of SpMV:
+ Implementation can be found in my GitHub repository [[https://github.com/ParafeniukMikalaj/spmv]]
and patch can be found in [[https://issues.apache.org/jira/browse/HAMA-524|Apache JIRA]] as
soon as JIRA will become available. GitHub repository contains only classes related to SpMV.
Before you start with SpMV make sure that you have followed [[http://wiki.apache.org/hama/GettingStarted|this]]
guide and set up environment variables and so on. 
+ ==== Optional additional setup ====
+ I considered two possible use cases of SpMV:
   1. Usage in pair with `RandomMatrixGenerator`.
   2. Usage with arbitrary text files.
  In this section you will see how to use SpMV in this two cases. I propose the following
directory structure for the following examples
@@ -44, +47 @@

  export HAMA_EXAMPLES=$HAMA_HOME/hama-examples*.jar
  export SPMV=/user/hduser/spmv
  }}}
- First variable allows fast access to jar with hama examples, which plased in hama home directory,
second variable is prefix in HDFS for tests in this tutorial. If you not defined this variables
just substitute appropriate values into following scripts.
+ First variable allows fast access to jar with hama examples, which plased in hama home directory,
second variable is prefix in HDFS for tests in this tutorial. If you not defined this variables
just substitute appropriate values into following scripts.<<BR>>
+ ==== Representation of matrices in text format ====
+ It was decided to allow users to work with SpMV through text files. So in this section I
will describe text format for matrices. I decided to represent all matrices and vectors as
follows: each row of the matrix is represented by row index, length of the row, number of
non-zero items, pairs of index and value. All values inside rows are separated by whitespace,
rows are separated by newline. Vectors are represented as matrix rows with arbitrary row index(not
used). So, for example:
+ {{{
+ [1 0 2]    3 2 0 1 2 2
+ [0 0 0]  = 3 0
+ [0 5 1]    3 2 1 5 2 1
+ }}}
+ Now let's show some example. Imagine that you need to multiply
+ {{{
+ [1 0 6 0]   [2]   [38] 
+ [0 4 0 0] * [3] = [12] 
+ [0 2 3 0]   [6]   [24] 
+ [3 0 0 5]   [0]   [6]
+ }}}
+ First of all, you should create appropriate text files for input matrix and input vector.
For input matrix file should look like
+ {{{
+ 0 4 2 0 1 2 6
+ 1 4 1 1 4
+ 2 4 2 1 2 2 3
+ 3 4 2 0 3 3 5
+ }}}
+ For vector file should be look like
+ {{{
+ 0 4 3 0 2 1 3 2 6
+ }}}
  ==== Usage with RandomMatrixGenerator ====
  `RandomMatrixGenerator` as a `SpMV` works with sequence file format. So, to multiply random
matrix with random vector we will do the following: generate matrix and vector; convert matrix,
vector and result to text file; view matrix, vector and result. This sequence is described
by the following code snippet:
  {{{
@@ -69, +97 @@

  10:   hadoop dfs -cat /user/hduser/spmv/result-txt/*
     0	 6 6 0 0.7059786044267415 1 1.0738967463653346 2 0.6274907669206862 3 0.35938205240905363
4 0.18317827331814918 5 0.24541032101100438
  }}}
+ We got the expected result. So, now we will explain the meaning of each line in code snippet
above.<<BR>>
  Line 0: Clean up of directories related to SpMV tests.<<BR>>
  Line 1-2: Generation of input matrix and vector. In this example we test 6x6 matrix and
1x6 vector multiplication<<BR>>
  Line 3: SpMV algorithm.<<BR>>
@@ -85, +114 @@

  {{{
  Usage: matrixtotext <input matrix dir> <output matrix dir> [number of tasks
(default max)]
  }}}
+ To use SpMV in this mode you should provide text files in appropriate format, as described
above. 
- To use SpMV in this mode you should provide text files in appropriate format. I decided
to represent all matrices and vectors as follows: each row of the matrix is represented by
row index, length of the row, number of non-zero items, pairs of index and value. All values
inside rows are separated by whitespace, rows are separated by newline. Vectors are represented
as matrix rows with arbitrary row index(not used). So, for example:
- {{{
- [1 0 2]    3 2 0 1 2 2
- [0 0 0]  = 3 0
- [0 5 1]    3 2 1 5 2 1
- }}}
- Now let's show some example. Imagine that you need to multiply
- {{{
-  [1 0 6 0]   [2]   [38] 
-  [0 4 0 0] * [3] = [12] 
-  [0 2 3 0]   [6]   [24] 
-  [3 0 0 5]   [0]   [6]
- }}}
- First of all, you should create appropriate text files for input matrix and input vector.
For input matrix file should look like
- {{{
- 0 4 2 0 1 2 6
- 1 4 1 1 4
- 2 4 2 1 2 2 3
- 3 4 2 0 3 3 5
- }}}
- For vector file should be look like
- {{{
- 0 4 3 0 2 1 3 2 6
- }}}
  After that you should copy these files to HDFS. If you don't feel comfortable with HDFS
please see [[http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html|this tutorial]].
After you have copied input matrix into `matrix-txt` and input vector into `vector-txt`, we
are ready to start. The following code snippet shows, how you can multiply matrices in this
mode. Explanations will be given below.
  {{{
  1: hama jar $HAMA_EXAMPLES matrixtoseq $SPMV/matrix-txt $SPMV/matrix-seq sparse 4

Mime
View raw message