hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Robertson <timrobertson...@gmail.com>
Subject Re: I thought map and reduce could not overlap?
Date Sat, 14 Nov 2009 16:17:03 GMT
My understanding is the following:
As map tasks finish, it starts to pipe the output of the map to the
reducer machines, but it does not do the reduce yet.  During this
stage if you look at the running reducers, you will see it say
something like "copying 4 of 45".  Once all the maps have finished and
copied, you will see Reduce at 33%.  Once all the maps have finished,
the copying will finish afterwards, then the sorting, and then the
reduce starts.

Basically this overlap is just it beginning to copy the data that is
ready onto the reducer machines.

Cheers

Tim


On Sat, Nov 14, 2009 at 5:05 PM, Raymond Jennings III
<raymondjiii@yahoo.com> wrote:
> I thought there was a barrier that ensured the map phase would finish before the reduce
phase started but I see on the sample hadoop word count app:
>
> 09/11/14 10:58:50 INFO mapred.JobClient:  map 79% reduce 18%
> 09/11/14 10:58:54 INFO mapred.JobClient:  map 79% reduce 19%
> 09/11/14 10:58:55 INFO mapred.JobClient:  map 80% reduce 19%
> 09/11/14 10:58:58 INFO mapred.JobClient:  map 80% reduce 20%
> 09/11/14 10:59:00 INFO mapred.JobClient:  map 81% reduce 20%
> 09/11/14 10:59:04 INFO mapred.JobClient:  map 82% reduce 20%
> 09/11/14 10:59:05 INFO mapred.JobClient:  map 82% reduce 21%
> 09/11/14 10:59:08 INFO mapred.JobClient:  map 82% reduce 22%
>
> That looks loke they are overlapping?
>
>
>
>
>

Mime
View raw message