hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Anchlia <mohitanch...@gmail.com>
Subject Re: can local disk of reduce task cause the job to fail?
Date Sun, 09 Dec 2012 17:15:11 GMT
Reducer will not start executing until shuffle and sort phase is complete

Sent from my iPhone

On Dec 9, 2012, at 4:09 AM, Majid Azimi <majid.merkava@gmail.com> wrote:

> Hi guys,
> 
> Hadoop the definitive guide says: reduce tasks will start only when all maps has done
their work.  Also this link says:
> 
> >> The shuffle and sort phases occur simultaneously; while map-outputs are being
fetched they are merged.
> 
> What I have understood is that when a reducer task starts then all data it needs(including
a key and associated values) have been transferred to its local node. Am I right? if this
is true then, the node running reduce task must have enough storage to hold all values associated
with that key, else The job will fail.
> 
> If no, then reduce job starts with some available data and shuffle + sort phase feed
reduce task contiguously, thus low storage on node does not cause problem because data is
coming on demand.
> 
> which of the two cases actually happen?

Mime
View raw message