hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kris Nuttycombe <kris.nuttyco...@gmail.com>
Subject Re: Trying to figure out possible causes of this exception
Date Wed, 07 Apr 2010 15:31:59 GMT
So, the issue is that the input path I specified was a directory, not a file.

As a result, Hadoop helpfully assumed that I wanted a file called
"data" in that directory to be the input, and proceeded down the path
with that assumption, instead of failing fast. I had to go to the
source code to figure out why it was doing this.

I'm finding that Hadoop has this sort of behavior (assume a useless
default instead of failing fast) in a number of locations, some of
them highly problematic, such as the dreaded DrWho default user.) it
was only after reading http://blog.rapleaf.com/dev/?p=382 that I
figured out why some of my services are losing data - due to the
hadoop libs falling back to DrWho under strange conditions, then
throwing a permissions exception when attempting to write a file,
which subsequently kills a buffer-flush thread of a long-lived
process...

It would be very helpful if Hadoop were to fail fast when encountering
incorrect configuration rather than assuming a default which will
essentially never be used in a production environment. Both of these
issues have cost me far more time and money in lost business ($50k
just this week thanks do DrWho) than failing fast would have done.

Thanks,

Kris

On Wed, Apr 7, 2010 at 6:23 AM, Sonal Goyal <sonalgoyal4@gmail.com> wrote:
> hi Kris,
>
> Seems your program can not find the input file. Have you done a hadoop fs
> -ls to verify that the file exists? Also, the path URL should be
> hdfs://......
>
>
> Thanks and Regards,
> Sonal
> www.meghsoft.com
>
>
> On Wed, Apr 7, 2010 at 1:16 AM, Kris Nuttycombe <kris.nuttycombe@gmail.com>
> wrote:
>>
>> Exception in thread "main" java.io.FileNotFoundException: File does
>> not exist: hdfs:///test-batchEventLog/metrics/data
>>        at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:457)
>>        at
>> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63)
>>        at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
>>        at
>> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
>>        at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
>>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>>        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
>>        at reporting.HDFSMapReduceQuery.execute(HDFSMetricsQuery.scala:60)
>>
>> My job config contains the following:
>>
>>    println("using input path: " + inPath)
>>    println("using output path: " + outPath)
>>    FileInputFormat.setInputPaths(job, inPath);
>>    FileOutputFormat.setOutputPath(job, outPath)
>>
>> with input & output paths printed out as:
>>
>> using input path: hdfs:/test-batchEventLog
>> using output path:
>> hdfs:/test-batchEventLog/out/03d24392-9bd9-4b23-8240-aceb54b3473c
>>
>> Any ideas why this would be occurring?
>>
>> Thanks,
>>
>> Kris
>
>

Mime
View raw message