hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Need help understanding the source
Date Tue, 07 Jul 2009 17:14:47 GMT
On Tue, Jul 7, 2009 at 1:13 AM, jason hadoop <jason.hadoop@gmail.com> wrote:
>
>
> The other alternative you may try is simply to write your map outputs to
> HDFS [ie: setNumReduces(0)], and have a consumer pick up the map outputs as
> they appear. If the life of the files is short and you can withstand data
> loss, you may turn down the replication factor, to speed the writes.
>

I'm not sure that would be very easy, since the output is initially written
into a temporary directory. I suppose you could go digging through the
temporary directory to catch the map outputs as they finish, but it's
probably tricky at best and certainly not intended

-Todd


> On Tue, Jul 7, 2009 at 12:30 AM, Amr Awadallah <aaa@cloudera.com> wrote:
>
> > To add to Todd/Ted's wise words, the Hadoop (and MapReduce) architects
> > didn't impose this limitation just for fun, it is very core to enabling
> > Hadoop to be as reliable as it is. If the reducer starts processing
> mapper
> > output immediately and a specific mapper fails then the reducer would
> have
> > to know how to undo the specific pieces of work related to the failed
> > mapper, not trivial at all. That said, the combiners do achieve a bit of
> > that for you, as they start working immediately on the map out, but on a
> > per-mapper basis (not global), so easy to handle failure in that case
> (you
> > just redo that mapper and the combining for it).
> >
> > -- amr
> >
> >
> > Ted Dunning wrote:
> >
> >> I would consider this to be a very delicate optimization with little
> >> utility
> >> in the real world.  It is very, very rare to reliably know how many
> >> records
> >> the reducer will see.  Getting this wrong would be a disaster.  Getting
> it
> >> right would be very difficult in almost all cases.
> >>
> >> Moreover, this assumption is baked all through the map-reduce design and
> >> thus doing a change to allow reduce to go ahead is likely to be really
> >> tricky (not that I know this for a fact).
> >>
> >>
> >> On Mon, Jul 6, 2009 at 11:14 AM, Naresh Rapolu <
> >> nareshreddy.rapolu@gmail.com
> >>
> >>
> >>> wrote:
> >>>
> >>>
> >>
> >>
> >>
> >>> My aim is to make the reduce move ahead with reduction as and when it
> >>> gets
> >>> the data required, instead of waiting for all the maps to complete.  If
> >>> it
> >>> knows how many records it needs and compares it with number of records
> it
> >>> has got until now,  it can move on once they become equal without
> waiting
> >>> for all the maps to finish.
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message