hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma" <jssa...@facebook.com>
Subject RE: performance of multiple map-reduce operations
Date Wed, 07 Nov 2007 04:19:48 GMT
Stating the obvious - the issue is starting the next mapper using
outputs from a reduce that is not yet complete. Currently the reduce
succeeds all or nothing and partial output is not available. If this
behavior is changed one has to make sure:

- partial reduce outputs are cleaned up in case of eventual failure of
reduce job (and I guess kill the next map job as well)
- running the next map job in a mode where it's input files arrive over
time (and are not statically fixed at launch time).

which don't seem like a small change.

Even without such optimizations - would love it if one could submit a
dependency graph of jobs to the job tracker (instead of arranging it
from outside). (But i digress).

-----Original Message-----
From: redpony@gmail.com [mailto:redpony@gmail.com] On Behalf Of Chris
Dyer
Sent: Tuesday, November 06, 2007 5:05 PM
To: hadoop-user@lucene.apache.org
Subject: Re: performance of multiple map-reduce operations

On 11/6/07, Doug Cutting <cutting@apache.org> wrote:
>
> Joydeep Sen Sarma wrote:
> > One of the controversies is whether in the presence of failures,
this
> > makes performance worse rather than better (kind of like udp vs. tcp
-
> > what's better depends on error rate). The probability of a failure
per
> > job will increase non-linearly as the number of nodes involved per
job
> > increases. So what might make sense for small clusters may not make
> > sense for bigger ones. But it sure would be nice to have this
option.
>
> Hmm.  Personally I wouldn't put a very high priority on complicated
> features that don't scale well.


I don't think this is necessarily that what we're asking for is
complicated,
and I think it can be made to scale quite well.  The critique is
appropriate
for the current "polyreduce" prototype, but that is a particular
solution to
a general problem.  And *scaling* is exactly the issue that we are
trying to
address here (since the current model is quite wasteful of computational
resources).

As for complexity, I do agree that the current prototype that Joydeep
linked
to is needlessly cumbersome, but, to solve this problem theoretically,
the
only additional information that would need to be represented to a
scheduler
to plan a composite job is just a dependency graph between the mappers
and
reducers.   Dependency graphs are about as clear as it gets in this
business, and they certainly aren't going to have any scaling problems
and
conceptual problems.

I don't really see how failures would be more catastrophic if this is
implemented purely in terms of smarter scheduling (which is adamantly
not
what the current polyreduce prototype does).  You're running the exact
same
jobs you would be running otherwise, just sooner. :)

Chris

Mime
View raw message