hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Groschupf ...@media-style.com>
Subject reducing hang: does size matter?
Date Tue, 14 Mar 2006 03:15:50 GMT
I opened a new thread since I'm not sure if this related to Michael  
Stack' problem.
I tried the patch from Doug without throwing the exception and it  
does not solve the problem.
However with a set of changes like catch exception inside the reduce  
iteration of the method run,  I was able to get my job running but  
with a much smaller size as the other that failed.
I now started again a bigger job, but crawling  1 mio  take some  
time. ;-(

I'm asking my self if size matters?

I notice following:
+ the reduce task that is the slowest crash in the end (my since it  
is the largest ? )
+ it crash in the last 2 % of of reducing (so it is not related to  
copy map output files ? )
+ all writes that are used have a XX.close(Reporter reporter) method  
but never use the reporter.
I didn't find any processing intensive close implementation but may  
be in the end a set of writers need to be closed, may be that takes  
that long?
The only thing I found is that in DFSOutputStream#close is a   
Thread.sleep(400) may since there is a set of writer using dfs that  
could be an issue. In general working with timeouts and having a  
sleep could be difficult.
Since there is also a namenode.complete call done, can this may block  
the closing of writers since replication need to be created?

Any thoughts are very welcome!!


blog: http://www.find23.org
company: http://www.media-style.com

View raw message