hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Duxbury <br...@rapleaf.com>
Subject Re: java.io.IOException: Could not get block locations. Aborting...
Date Mon, 09 Feb 2009 22:54:35 GMT
Small files are bad for hadoop. You should avoid keeping a lot of  
small files if possible.

That said, that error is something I've seen a lot. It usually  
happens when the number of xcievers hasn't been adjusted upwards from  
the default of 256. We run with 8000 xcievers, and that seems to  
solve our problems. I think that if you have a lot of open files,  
this problem happens a lot faster.

-Bryan

On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:

> Hi all -
>
> I've been running into this error the past few days:
> java.io.IOException: Could not get block locations. Aborting...
> 	at org.apache.hadoop.dfs.DFSClient 
> $DFSOutputStream.processDatanodeError(DFSClient.java:2143)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400 
> (DFSClient.java:1735)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run 
> (DFSClient.java:1889)
>
> It seems to be related to trying to write to many files to HDFS.  I  
> have a class extending  
> org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output  
> to a few file names, everything works.  However, if I output to  
> thousands of small files, the above error occurs.  I'm having  
> trouble isolating the problem, as the problem doesn't occur in the  
> debugger unfortunately.
>
> Is this a memory issue, or is there an upper limit to the number of  
> files HDFS can hold?  Any settings to adjust?
>
> Thanks.


Mime
View raw message