# mahout-user mailing list archives

##### Site index · List index
Message view
Top
From Danny Bickson <danny.bick...@gmail.com>
Subject Re: How to input a matrix to use SVD in mahout
Date Fri, 23 Sep 2011 08:41:07 GMT
```Hi!
You can find detailed Java code to convert your example to Mahout SVD format
on my blog here:
http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html

Since I know some Chinese users a blocked to goole websites, here is the
content:

Best,

Danny Bickson

Friday, February 4, 2011  Mahout - SVD matrix factorization -
formatting input matrix
Converting Input Format into Mahout's SVD Distributed Matrix Factorization
Solver

Purpose
The code below, converts a matrix from csv format:
<from row>,<to col>,<value>\n
Into Mahout's SVD solver format.

For example,
The 3x3 matrix:
0    1.0 2.1
3.0  4.0 5.0
-5.0 6.2 0

Will be given as input in a csv file as:
1,0,3.0
2,0,-5.0
0,1,1.0
1,1,4.0
2,1,6.2
0,2,2.1
1,2,5.0

NOTE: I ASSUME THE MATRIX IS SORTED BY THE COLUMNS ORDER
This code is based on code by Danny Leshem, ContextIn.

Command line arguments:
args[0] - path to csv input file
args[1] - cardinality of the matrix (number of columns)
args[2] - path the resulting Mahout's SVD input file

Method:
The code below, goes over the csv file, and for each matrix column, creates
a SequentialAccessSparseVector which contains all the non-zero row entries
for this column.
Then it appends the column vector to file.

Compilation:
Copy the java code below into an java file named Convert2SVD.java
command line option for compilation is given below.

view plain<http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html#>
print<http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html#>
?<http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html#>

3. import java.util.StringTokenizer;
4.
5. import org.apache.mahout.math.SequentialAccessSparseVector;
6. import org.apache.mahout.math.Vector;
7. import org.apache.mahout.math.VectorWritable;
14.
15. /**
16.  * Code for converting CSV format to Mahout's SVD format
17.  * @author Danny Bickson, CMU
18.
* Note: I ASSUME THE CSV FILE IS SORTED BY THE COLUMN (NAMELY THE
SECOND FIELD).

19.  *
20.  */
21.
22. public class Convert2SVD {
23.
24.
25.         public static int Cardinality;
26.
27.         /**
28.          *
29.          * @param args[0] - input csv file
30.          * @param args[1] - cardinality (length of vector)
31.          * @param args[2] - output file for svd
32.          */
33.         public static void main(String[] args){
34.
35. try {
36.         Cardinality = Integer.parseInt(args[1]);
37.         final Configuration conf = new Configuration();
38.         final FileSystem fs = FileSystem.get(conf);
39.         final
SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, new
Path(args[2]), IntWritable.class, VectorWritable.class
, CompressionType.BLOCK);
40.
41.           final IntWritable key = new IntWritable();
42.           final VectorWritable value = new VectorWritable();
43.
44.
45.            String thisLine;
46.
48.            Vector vector = null;
49.            int from = -1,to  =-1;
50.            int last_to = -1;
51.            float val = 0;
52.            int total = 0;
53.            int nnz = 0;
54.            int e = 0;
55.            int max_to =0;
56.            int max_from = 0;
57.
58.            while ((thisLine = br.readLine()) != null) {
// while loop begins here
59.
60.                  StringTokenizer st = new StringTokenizer(thisLine,
",");
61.                  while(st.hasMoreTokens()) {
62.                      from = Integer.parseInt(st.nextToken())-1;
//convert from 1 based to zero based
63.                      to = Integer.parseInt(st.nextToken())-1;
//convert from 1 based to zero basd
64.                      val = Float.parseFloat(st.nextToken());
65.                      if (max_from < from) max_from = from;
66.                      if (max_to < to) max_to = to;
67.                      if (from < 0 || to < 0
|| to > Cardinality || val == 0.0)
68.                          throw new NumberFormatException("wrong data"
+ from + " to: " + to + " val: " + val);
69.                  }
70.
71.
//we are working on an existing column, set non-zero rows in it
72.                  if (last_to != to && last_to != -1){
73.                      value.set(vector);
74.
75.                      writer.append(key, value);
//write the older vector
76.                      e+= vector.getNumNondefaultElements();
77.                  }
78.                  //a new column is observed, open a new vector for it

79.                  if (last_to != to){
80.                      vector = new
SequentialAccessSparseVector(Cardinality);
81.                      key.set(to); // open a new vector
82.                      total++;
83.                  }
84.
85.                  vector.set(from, val);
86.                  nnz++;
87.
88.                  if (nnz % 1000000 == 0){
89.                    System.out.println("Col" + total + " nnz: "
+ nnz);
90.                  }
91.                  last_to = to;
92.
93.           } // end while
94.
95.            value.set(vector);
96.            writer.append(key,value);//write last row
97.            e+= vector.getNumNondefaultElements();
98.            total++;
99.
100.            writer.close();
101.            System.out.println("Wrote a total of " + total + " cols "
+ " nnz: " + nnz);
102.            if (e != nnz)
103.                 System.err.println("Bug:missing edges! we only got"
+ e);
104.
105.            System.out.println("Highest column: " + max_to +
" highest row: " + max_from );
106.         } catch(Exception ex){
107.                 ex.printStackTrace();
108.         }
109.     }
110. }

A second option to compile this file is create a Makefile, with the
following in it:
view plain<http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html#>
print<http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html#>
?<http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html#>

1. all:
3.1.1.jar:/mnt/bigbrofs/usr7/bickson/mahout-0.4
/taste-web/target/mahout-taste-webapp-0.5
-SNAPSHOT/WEB-INF/lib/mahout-core-0.5
-SNAPSHOT.jar:/mnt/bigbrofs/usr7/bickson/mahout-0.4
/taste-web/target/mahout-taste-webapp-0.5
-SNAPSHOT/WEB-INF/lib/mahout-math-0.5
-core.jar *.java

Note that you will have the change location of the jars to point to where

Example for running this conversion for netflix data:
view plain<http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html#>
print<http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html#>
?<http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html#>

.jar:/mnt/bigbrofs/usr7/bickson/mahout-0.4
/taste-web/target/mahout-taste-webapp-0.5
-SNAPSHOT/WEB-INF/lib/mahout-core-0.5
-SNAPSHOT.jar:/mnt/bigbrofs/usr7/bickson/mahout-0.4
/taste-web/target/mahout-taste-webapp-0.5
-SNAPSHOT/WEB-INF/lib/mahout-math-0.5
/lib/commons-logging-api-1.0.4.jar Convert2SVD ../../netflixe.csv 17770
netflixe.seq
2. Aug 23, 2011 1:16:06
your platform... using builtin-java classes where applicable
4. Aug 23, 2011 1:16:06
5. INFO: Got brand-new compressor
6. Row241 nnz: 1000000
7. Row381 nnz: 2000000
8. Row571 nnz: 3000000
9. Row789 nnz: 4000000
10. Row1046 nnz: 5000000
11. Row1216 nnz: 6000000
12. Row1441 nnz: 7000000
13.
14. ...
15. </clinit>

2011/9/23 悟统 <junwei.wang@alipay.com>

> Hi,all
> I am studing Mahout. I would like to use SVD in mahout with a matrix,
> The matrix is like this
> 1 0 0 0 0
> 2 4 1 0.5 2
> 2.1 2 4 0 1
> -1.8 2 1 5 1
> 0 3.4 5.9 3 9
>
> How do I to input in Mahout SVD?
>
> ________________________________
>
> This email (including any attachments) is confidential and may be legally
> privileged. If you received this email in error, please delete it
> immediately and do not copy it or use it for any purpose or disclose its
> contents to any other person. Thank you.
>
>
> 本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人，请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。
>

```
Mime
• Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message