hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi" <runp...@yahoo-inc.com>
Subject Real use scenario of streaming with Reduce=None
Date Fri, 20 Apr 2007 23:24:52 GMT
 

With HADOOP-1216, the framework will support reduce=none feature by setting
numReduceTasks=0. 

If a map/reduce job set numReduceTasks=0, it will not create any reducer
tasks.  

The mappers will not generate the map output files either. 

Rather, each mapper will generate one DFS file in the output dir specified
for the job and save the output of the mapper to the file as a part of the
final result.

This behavior will be the same whether a job is streaming or non-streaming.

I wonder whether this behavior serves all the need of the current stream job
user community. 

If so, we can eliminate all the weird "features" currently hacked in
streaming implementation, such as sending the output of mappers through a
socket (i.e. useSingleSideOutputURI_ option).

 

Thoughts?

 

Runping

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message