hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: performance of multiple map-reduce operations
Date Tue, 06 Nov 2007 22:40:09 GMT
Chris Dyer wrote:
> For one computation I've been working on lately, over 25% of the time is
> spent in the last 10% of each map/reduce operation (this has to do with the
> natural distribution of my input data and would be unavoidable even given an
> optimal partitioning).  During this time, I have dozens of nodes sitting
> idle that could be executing the map part of the next job, if only the
> framework knew that is was coming.  Has anyone dealt with this or found a
> good workaround?

If your next job depends on the output of the prior job, then you need 
to wait for the prior to complete.  But if your next job is independent, 
you can submit it right away, and its map tasks will run as the reduce 
tasks are running for the prior job.


View raw message