hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Spreitzer <mspre...@us.ibm.com>
Subject Looking for stragglers in iterated map-reduce
Date Thu, 13 Oct 2011 17:12:01 GMT
In iterated map-reduce, a series of code-identical jobs where the reduce 
output of one is the map input of the next, there are two synchronization 
barriers per iteration: one in the middle of each job (between map and 
reduce) and one at the end of each job.  In principle this could be a 
painfully excessive amount of synchronization.  Is it in practice?  Do you 
have iterated map-reduce applications with great load imbalance in some 

View raw message