hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Reduce side question on MR
Date Thu, 30 May 2013 11:31:00 GMT
I don't see a direct question asked, but here's a condition in the
source code you want to take a look at (*):
https://github.com/apache/hadoop-common/blob/branch-1/src/mapred/org/apache/hadoop/mapred/JobInProgress.java#L2316

(*) - Yet to appear in MRv2 - See/help out with MAPREDUCE-2723.

On Wed, May 29, 2013 at 8:10 PM, Rahul Bhattacharjee
<rahul.rec.dgp@gmail.com> wrote:
> Hi,
>
> I have one question related to the reduce phase of MR jobs.
>
> The intermediate outputs of map tasks are pulled in from the nodes which ran
> map tasks to the node where reducers is going to run and those intermediate
> data is written to the reducers local fs. My question is that if there is a
> job processing huge amount of data and it has multiple mappers but only one
> reducer , then its possible that the job would never complete successfully
> as the single hosts disk might not be sufficient to hold all the map outputs
> of the job.
>
> The job essentially would fail after retrying configured number of attempts.
>
> Thanks,
> Rahul



--
Harsh J

Mime
View raw message