hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lin Ma <lin...@gmail.com>
Subject Re: reducer tasks start time issue
Date Sun, 23 Dec 2012 15:09:06 GMT
Thanks for answering my question with not only the answer, but also
detailed description. :-)

regards,
Lin

On Sun, Dec 23, 2012 at 12:15 AM, Harsh J <harsh@cloudera.com> wrote:

> A reduce can't process the complete data set until it has fetched all
> partitions. And any map may produce a partition for any reducer.
> Hence, we generally wait before all maps have terminated, and their
> partition outputs ready and copied over to reduces, before we begin to
> group and process the keys.
>
> However, given that you began thinking about this, this paper on
> "Online" Hadoop may interest you:
> http://www.neilconway.org/docs/nsdi2010_hop.pdf
>
> On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma <linlma@gmail.com> wrote:
> > Hi guys,
> >
> > Supposing in a Hadoop job, there are both mappers and reducers. My
> question
> > is, reducer tasks cannot begin until all mapper tasks complete? If so,
> why
> > designed in this way?
> >
> > thanks in advance,
> > Lin
>
>
>
> --
> Harsh J
>

Mime
View raw message