From myn <...@163.com>
Subject Re:Re: Re: is there some place to study Singular Value Decomposition algorithms
Date Mon, 29 Aug 2011 11:03:59 GMT
```the best way is to read the sorce code ;

@_@

At 2011-08-29 16:02:57,"Lance Norskog" <goksron@gmail.com> wrote:
>'R' also has an svd implementation, directly in the base package.
>
>1) What is SVD? The video lecture above will help. Also, searching for
>'singular value decomposition' on Baidu finds a lot of basic explanations.
>2) Why do you want it? It creates in on pass a few different unique
>explanations of what is going on inside your dataset.
>3) Mahout Distributed Matrix code, DistributedLanczos etc. are
>implementations specifically for large-scale problems. There are sub-parts
>of SVD that you may not need for your problem, and these jobs avoid some of
>the work.
>
>Until you have a solid grasp of what SVD can tell you, there is no point
>trying the distributed mahout jobs. The SingularValueDecomposition class in
>Mahout has served me well in my researches.
>
>Lance
>
>On Mon, Aug 29, 2011 at 12:50 AM, Danny Bickson <danny.bickson@gmail.com>wrote:
>
>> import java.util.StringTokenizer;
>>
>> import org.apache.mahout.math.SequentialAccessSparseVector;
>> import org.apache.mahout.math.Vector;
>> import org.apache.mahout.math.VectorWritable;
>>
>> /**
>>  * Code for converting CSV format to Mahout's SVD format
>>  * @author Danny Bickson, CMU
>>  * Note: I ASSUME THE CSV FILE IS SORTED BY THE COLUMN (NAMELY THE
>> SECOND FIELD).
>>  *
>>  */
>>
>> public class Convert2SVD {
>>
>>
>>        public static int Cardinality;
>>
>>        /**
>>         *
>>         * @param args[0] - input csv file
>>         * @param args[1] - cardinality (length of vector)
>>         * @param args[2] - output file for svd
>>         */
>>        public static void main(String[] args){
>>
>> try {
>>        Cardinality = Integer.parseInt(args[1]);
>>        final Configuration conf = new Configuration();
>>        final FileSystem fs = FileSystem.get(conf);
>>        final SequenceFile.Writer writer =
>> SequenceFile.createWriter(fs, conf, new Path(args[2]),
>> IntWritable.class, VectorWritable.class, CompressionType.BLOCK);
>>
>>          final IntWritable key = new IntWritable();
>>          final VectorWritable value = new VectorWritable();
>>
>>
>>           String thisLine;
>>
>>           Vector vector = null;
>>           int from = -1,to  =-1;
>>           int last_to = -1;
>>           float val = 0;
>>           int total = 0;
>>           int nnz = 0;
>>           int e = 0;
>>           int max_to =0;
>>           int max_from = 0;
>>
>>           while ((thisLine = br.readLine()) != null) { // while loop
>> begins here
>>
>>                 StringTokenizer st = new StringTokenizer(thisLine, ",");
>>                 while(st.hasMoreTokens()) {
>>                     from = Integer.parseInt(st.nextToken())-1;
>> //convert from 1 based to zero based
>>                     to = Integer.parseInt(st.nextToken())-1;
>> //convert from 1 based to zero basd
>>                     val = Float.parseFloat(st.nextToken());
>>                     if (max_from < from) max_from = from;
>>                     if (max_to < to) max_to = to;
>>                     if (from < 0 || to < 0 || to > Cardinality || val ==
>> 0.0)
>>                         throw new NumberFormatException("wrong data"
>> + from + " to: " + to + " val: " + val);
>>                 }
>>
>>                 //we are working on an existing column, set non-zero rows
>> in it
>>                 if (last_to != to && last_to != -1){
>>                     value.set(vector);
>>
>>                     writer.append(key, value); //write the older vector
>>                     e+= vector.getNumNondefaultElements();
>>                 }
>>                 //a new column is observed, open a new vector for it
>>                 if (last_to != to){
>>                     vector = new SequentialAccessSparseVector(Cardinality);
>>                     key.set(to); // open a new vector
>>                     total++;
>>                 }
>>
>>                 vector.set(from, val);
>>                 nnz++;
>>
>>                 if (nnz % 1000000 == 0){
>>                   System.out.println("Col" + total + " nnz: " + nnz);
>>                 }
>>                 last_to = to;
>>
>>          } // end while
>>
>>           value.set(vector);
>>           writer.append(key,value);//write last row
>>           e+= vector.getNumNondefaultElements();
>>           total++;
>>
>>           writer.close();
>>           System.out.println("Wrote a total of " + total + " cols " +
>> " nnz: " + nnz);
>>           if (e != nnz)
>>                System.err.println("Bug:missing edges! we only got" + e);
>>
>>           System.out.println("Highest column: " + max_to + " highest
>> row: " + max_from );
>>        } catch(Exception ex){
>>                ex.printStackTrace();
>>        }
>>    }
>> }
>>
>>
>>
>> 2011/8/29 myn <myn@163.com>
>>
>> > thanks
>> > But could you send the content ofhttp://
>> > bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html to me
>> ?
>> > I can`t open it  in china .
>> >
>> >
>> >
>> >
>> >
>> > At 2011-08-29 15:29:40,"Danny Bickson" <danny.bickson@gmail.com> wrote:
>> > >Command line arguments are found here:
>> > >https://cwiki.apache.org/MAHOUT/dimensional-reduction.html
>> > >I wrote a quick tutorial on how to prepare sparse matrices as input to
>> > >Mahout SVD here:
>> > >
>> http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html
>> > >
>> > >Let me know if you have further questions.
>> > >
>> > >2011/8/29 myn <myn@163.com>
>> > >
>> > >> i want to study Singular Value Decomposition algorithms;
>> > >> I also have a book called mahout in action,but i can`t found sth about
>> > this
>> > >> algorithm;
>> > >> is there someplace introduce how to use the method?
>> > >> till now DistributedLanczosSolver  is not a mapreduce method
>> svd
>> >
>>
>
>
>
>--
>Lance Norskog
>goksron@gmail.com

```
