hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Schulz <danielschulz2...@hotmail.com>
Subject RE: Job that just runs the reduce tasks
Date Fri, 09 Oct 2015 12:32:27 GMT
Hi,
Yes: this is possible. Just configure the 1st MR job's output path as the 2nd ones inputs.
There will be identity mappers running -- compared to no mappers -- but they come with Hadoop.
They are just a technical neccessity.
To avoid this overhead, Tez, Spark, Flink and other execution engines were build to write
a DAG and run your algorithms on them.
Kind regards, Daniel.

> To: user@hadoop.apache.org
> From: xeonmailinglist@gmail.com
> Subject: Job that just runs the reduce tasks
> Date: Fri, 9 Oct 2015 10:46:49 +0100
> 
> Hi,
> 
> If we run a job without reduce tasks, the map output is going to be 
> saved into HDFS. Now, I would like to launch another job that reads the 
> map output and compute the reduce phase. Is it possible to execute a job 
> that reads the map output from HDFS and just runs the reduce phase?
> 
> Thanks,
> 
 		 	   		  
Mime
View raw message