hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ross Boucher <bouc...@apple.com>
Subject Re: Reduce Performance
Date Fri, 21 Sep 2007 20:33:15 GMT
The input data was about 5GB, the total map processing time was about  
10 minutes.  Then, there was 5 minutes of reduce time on top of that  
spent moving the files around.

On Sep 21, 2007, at 12:20 PM, Doug Cutting wrote:

> Ross Boucher wrote:
>> My cluster has 4 machines on it, so based on the recommendations  
>> on the wiki, I set my reduce count to 8.  Unfortunately, the  
>> performance was less than ideal.  Specifically, when the map  
>> functions had finished, I had to wait an additional 40% of the  
>> total job time just for copying/sorting the files.  I know for a  
>> fact that the sort is very fast, so the only remaining question is  
>> why moving the files around takes so long.
> How much data was there to copy?  How long was the total job time?   
> If there are only small amounts of data, and the total job time is  
> short, then copy scheduling overhead might be significant.
> Doug

View raw message