mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: DistributedRowMatrix transpose method problem
Date Sun, 12 Sep 2010 21:03:59 GMT
Hi Abhijat,

  It looks like you've found a bug not in transpose(), but in iterateAll()
(and probably iterate() ) - the file globbing of the contents of the
sequence file directory is grabbing the  "_logs" subdirectory automatically
created by hadoop and trying to treat that as a part of the SequenceFile,
which it is not.

  Yep, line 207 of DistributedRowMatrix globs together anything in the
matrix row directory, that "*" should be more restrictive (maybe you can try
"part*" and recompile and see if your code works?).

  -jake

On Sun, Sep 12, 2010 at 1:25 PM, Abhijat Vatsyayan <
abhijat.vatsyayan@gmail.com> wrote:

> I isolated a bug in my program to a place where I am using
> DistributedRowMatrix.transpose(). When I send a "transpose" message to a
> DistributedRowMatrix object, I see the mapper and reducer being started, and
> the method finishes without any errors but my attempt to read the contents
> of the (transposed) matrix fails. Seems like I am missing something really
> basic here but any help will be appreciated.
>
> Here is the test case code (imports, package statement and comments not
> shown):
> public class TestMatrixIO {
>        @Test
>        public  void testDistributedTranspose( ) throws Exception
>        {
>                Configuration cfg = new Configuration( );
>                DistributedRowMatrix matrix = new
> DistributedRowMatrix(TestWriteMatrix.INPUT_TEST_MATRIX_FILE,"input/tmp_1",
> 3,4);
>                matrix.configure(new JobConf(cfg));
>                int count = printMatrix(matrix); // prints OK ..
>
>  System.out.println("[testReadingDistributedMatrix()]..NumElements="+count);
>                DistributedRowMatrix matrix_t = matrix.transpose();
>
>  System.out.println("[testReadingDistributedMatrix()]..Transpose done");
>                printMatrix(matrix_t); // Fails
>        }
>        private static int printMatrix(DistributedRowMatrix matrix) {
>                Iterator<MatrixSlice> iterator = matrix.iterateAll();
>                int count = 0;
>                while(iterator.hasNext())
>                {
>                        MatrixSlice slice = iterator.next();
>                        Vector v = slice.vector();
>                        int size = v.size();
>                        for(int i=0;i<size;i++)
>                        {
>                                Element e = v.getElement(i);
>                                count++;
>                                System.out.print(e.get()+" ");
>                        }
>                        System.out.println();
>                }
>                return count;
>        }
> }
>
> The stack trace when I try to print the matrix on the last line of  the
> testDistributedTranspose method is :
> java.lang.IllegalStateException: java.io.IOException: Cannot open filename
> /user/abhijat/input/transpose-104/_logs
>        at
> org.apache.mahout.math.hadoop.DistributedRowMatrix.iterateAll(DistributedRowMatrix.java:118)
>        at
> net.abhijat.hadoop.mr.testexec.TestMatrixIO.printMatrix(TestMatrixIO.java:28)
>        at
> net.abhijat.hadoop.mr.testexec.TestMatrixIO.testDistributedTranspose(TestMatrixIO.java:25)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>        at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>        at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>        at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>        at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>        at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>        at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>        at
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
>        at
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>        at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
>        at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
>        at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
>        at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
> Caused by: java.io.IOException: Cannot open filename
> /user/abhijat/input/transpose-104/_logs
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1488)
>        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:376)
>        at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>        at
> org.apache.mahout.math.hadoop.DistributedRowMatrix$DistributedMatrixIterator.<init>(DistributedRowMatrix.java:216)
>        at
> org.apache.mahout.math.hadoop.DistributedRowMatrix.iterateAll(DistributedRowMatrix.java:116)
>        ... 24 more
>
>
> "hadoop fs -ls input" shows that the transpose job did create the directory
> and output files. I created the matrix file using following code (imports,
> package statement and comments not shown):
> public class TestWriteMatrix {
>        public static final String INPUT_TEST_MATRIX_FILE =
> "input/test.matrix.file";
>        public static final double[][] matrix_dat =
>        {
>                {1,3,-2,0},
>                {2,3,2,-9},
>                {-1,1,-4,10}
>        };
>        @Test
>        public void testWritingMatrix() throws Exception
>        {
>                Configuration cfg = new Configuration( );
>                FileSystem fs = FileSystem.get(cfg);
>                SequenceFile.Writer writer = SequenceFile.createWriter(fs,
> cfg, new Path(INPUT_TEST_MATRIX_FILE),
>                                IntWritable.class, VectorWritable.class) ;
>                for(int i=0;i<matrix_dat.length;i++)
>                {
>                        DenseVector  row = new DenseVector(matrix_dat[i]);
>                        VectorWritable vwritable = new VectorWritable(row);
>                        writer.append(new IntWritable(i), vwritable);
>                }
>                writer.close();
>        }
> }

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message