hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-424) mapreduce jobs fail when no split is returned via inputFormat.getSplits
Date Tue, 08 Aug 2006 21:59:14 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-424?page=comments#action_12426736 ] 
Doug Cutting commented on HADOOP-424:

I agree that this should not fail.  Can you please modify one of the mini-mr unit tests to
test for this case and submit that as a patch?

> mapreduce jobs fail when no split is returned via inputFormat.getSplits
> -----------------------------------------------------------------------
>                 Key: HADOOP-424
>                 URL: http://issues.apache.org/jira/browse/HADOOP-424
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.4.0
>            Reporter: Frédéric Bertin
> I'm using a MapReduce job to process some data logged and timestamped into files.
> When the job runs, it does not process the whole data, but filters only the data that
has been logged since the last job run.
> However, when no new data has been logged, the job fails because the getSplits method
of InputFormat returns no split. Thus the number of map tasks is 0. This is not intercepted,
and the job fails at reduce step because it seems it does not find any data to process:
> java.io.FileNotFoundException: /local/home/hadoop/var/mapred/local/task_0030_r_000000_3/all.2
at org.apache.hadoop.fs.LocalFileSystem.openRaw(LocalFileSystem.java:121) at org.apache.hadoop.fs.FSDataInputStream$Checker.(FSDataInputStream.java:47)
at org.apache.hadoop.fs.FSDataInputStream.(FSDataInputStream.java:221) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:150)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:259) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:253)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:241) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1013)
> What should be Hadoop's behaviour in such a case?
> IMHO, the job should be considered as successful. Indeed, this is not a job failure,
but just a lack of input data. WDYT?

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message