hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Groschupf ...@media-style.com>
Subject mapoutputs from killed job are reduced.
Date Fri, 17 Mar 2006 22:05:44 GMT

I note a strange behavior but I can not find any code that is  
responsible for that.
I had very many killed jobs that successful map but never reduced  
So I had many files like /hadoop/mapred/local/task_m_5zta35.out. Now  
I note when running a reduce task it happen that all boxes try to  
download such map files from one specific box.
First of all it make no sense to process old map data in a new job in  
combination with the new mapoutput data.
Second it happen that the one box is that much under load the the  
other reduces task crash with a connection timeout.

Wouldn't be a good idea to clean up /hadoop/mapred/local/ until  
tasktracker start up.
Also I do not clearly understand how the map ouput files are  
collected but this should be may be improved in a way that it can not  
happen that a newer job tries to process also the map output of a  
older job.

Any hints where to search for the problem?


blog: http://www.find23.org
company: http://www.media-style.com

View raw message