hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma" <jssa...@facebook.com>
Subject RE: performance of multiple map-reduce operations
Date Wed, 07 Nov 2007 04:19:48 GMT
Stating the obvious - the issue is starting the next mapper using
outputs from a reduce that is not yet complete. Currently the reduce
succeeds all or nothing and partial output is not available. If this
behavior is changed one has to make sure:

- partial reduce outputs are cleaned up in case of eventual failure of
reduce job (and I guess kill the next map job as well)
- running the next map job in a mode where it's input files arrive over
time (and are not statically fixed at launch time).

which don't seem like a small change.

Even without such optimizations - would love it if one could submit a
dependency graph of jobs to the job tracker (instead of arranging it
from outside). (But i digress).

-----Original Message-----
From: redpony@gmail.com [mailto:redpony@gmail.com] On Behalf Of Chris
Sent: Tuesday, November 06, 2007 5:05 PM
To: hadoop-user@lucene.apache.org
Subject: Re: performance of multiple map-reduce operations

On 11/6/07, Doug Cutting <cutting@apache.org> wrote:
> Joydeep Sen Sarma wrote:
> > One of the controversies is whether in the presence of failures,
> > makes performance worse rather than better (kind of like udp vs. tcp
> > what's better depends on error rate). The probability of a failure
> > job will increase non-linearly as the number of nodes involved per
> > increases. So what might make sense for small clusters may not make
> > sense for bigger ones. But it sure would be nice to have this
> Hmm.  Personally I wouldn't put a very high priority on complicated
> features that don't scale well.

I don't think this is necessarily that what we're asking for is
and I think it can be made to scale quite well.  The critique is
for the current "polyreduce" prototype, but that is a particular
solution to
a general problem.  And *scaling* is exactly the issue that we are
trying to
address here (since the current model is quite wasteful of computational

As for complexity, I do agree that the current prototype that Joydeep
to is needlessly cumbersome, but, to solve this problem theoretically,
only additional information that would need to be represented to a
to plan a composite job is just a dependency graph between the mappers
reducers.   Dependency graphs are about as clear as it gets in this
business, and they certainly aren't going to have any scaling problems
conceptual problems.

I don't really see how failures would be more catastrophic if this is
implemented purely in terms of smarter scheduling (which is adamantly
what the current polyreduce prototype does).  You're running the exact
jobs you would be running otherwise, just sooner. :)


View raw message