From Pat Ferrel <pat.fer...@gmail.com>
Subject Re: Error using hadoop in non-distributed mode
Date Thu, 06 Sep 2012 23:46:03 GMT
Thanks! You nailed it. 

Mahout was using the cache but fortunately there was an easy way to tell it not to and now
the jobs run local and therefore in a debugging setup.

On Sep 4, 2012, at 9:22 PM, Hemanth Yamijala <yhemanth@thoughtworks.com> wrote:


The path /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/<snip>
is a location used by the tasktracker process for the 'DistributedCache' - a mechanism to
distribute files to all tasks running in a map reduce job. (http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache).

You have mentioned Mahout, so I am assuming that the specific analysis job you are running
is using this feature to distribute the output of the file /Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
to the job that is causing a failure.

Also, I find links stating the distributed cache does not work with in the local (non-HDFS)
mode. (http://stackoverflow.com/questions/9148724/multiple-input-into-a-mapper-in-hadoop).
Look at the second answer.


On Tue, Sep 4, 2012 at 10:33 PM, Pat Ferrel <pat.ferrel@gmail.com> wrote:
The job is creating several output and intermediate files all under the location: Users/pat/Projects/big-data/b/ssvd/
several output directories and files are created correctly and the file Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
is created and exists at the time of the error. We seem to be passing in Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
as the input file.

Under what circumstances would an input path passed in as "Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000"
be turned into "pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000"


On Sep 4, 2012, at 1:14 AM, Narasingu Ramesh <ramesh.narasingu@gmail.com> wrote:

Hi Pat,
            Please specify correct input file location.
Thanks & Regards,

On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel <pat@occamsmachete.com> wrote:
Using hadoop with mahout in a local filesystem/non-hdfs config for debugging purposes inside
Intellij IDEA. When I run one particular part of the analysis I get the error below. I didn't
write the code but we are looking for some hint about what might cause it. This job completes
without error in a single node pseudo-clustered config outside of IDEA.

several jobs in the pipeline complete without error creating part files just fine in the local
file system

The file /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000

which is the subject of the error - does not exist


does exist at the time of the error. So the code is looking for the data in the wrong place?

12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 INFO compress.CodecPool: Got brand-new decompressor
12/09/02 14:56:29 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
does not exist.
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
        at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
        at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
        at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
        at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '', transport: 'socket'

