hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yoram Arnon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-645) Map-reduce task does not finish correctly when -reducer NONE is specified
Date Fri, 27 Oct 2006 23:55:17 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-645?page=comments#action_12445300 ] 
Yoram Arnon commented on HADOOP-645:

it's streaming specific.

That said, with each map task normally working on one entire dfs block but sometimes working
on an entire file (like in the case of gzipped data), generating one file per map will result
in fairly large files on output. If the imput was a bunch of small files to begin with, the
output is no worse than the input.

With iterative jobs in particular, where the job output is the input to the next job and is
really temporary, it is very reasonable to skip shuffling the data and sorting it if possible.

> Map-reduce task does not finish correctly when -reducer NONE is specified
> -------------------------------------------------------------------------
>                 Key: HADOOP-645
>                 URL: http://issues.apache.org/jira/browse/HADOOP-645
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.7.2
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
> Map-reduce task does not finish correctly when -reducer NONE is specified, The NONE option
means that the reducer should not be generating any output. Using this option causes an exception
in the task tracker:
> java.lang.IllegalArgumentException: URI is not hierarchical
> TaskRunner: at java.io.File.<init>(File.java:335)
> TaskRunner: at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:583)
> TaskRunner: at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:96)
> TaskRunner: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:49)
> TaskRunner: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:213)
> TaskRunner: at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1240)
> TaskRunner: sideEffectURI_ file:output length 11

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message