hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Whitecross <sc...@dataxu.com>
Subject Re: java.io.IOException: Could not get block locations. Aborting...
Date Tue, 10 Feb 2009 03:50:16 GMT
I tried modifying the settings, and I'm still running into the same  
issue.  I increased the xceivers count (fs.datanode.max.xcievers) in  
the hadoop-site.xml file.  I also checked to make sure the file  
handles were increased, but they were fairly high to begin with.

I don't think I'm dealing with anything out of the ordinary either.   
I'm process three large 'log' files, totaling around 5 GB, and  
producing around 8000 output files after some data processing,  
probably totals 6 or 7 gig.   In the past, I've produced a lot fewer  
files, and that has been fine.  When I change the process to output to  
just a few files, no problem again.

Anything else beyond the limits?  Is HDFS creating a substantial  
amount of temp files as well?

On Feb 9, 2009, at 8:11 PM, Bryan Duxbury wrote:

> Correct.
> +1 to Jason's more unix file handles suggestion. That's a must-have.
> -Bryan
> On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote:
>> This would be an addition to the hadoop-site.xml file, to up  
>> dfs.datanode.max.xcievers?
>> Thanks.
>> On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:
>>> Small files are bad for hadoop. You should avoid keeping a lot of  
>>> small files if possible.
>>> That said, that error is something I've seen a lot. It usually  
>>> happens when the number of xcievers hasn't been adjusted upwards  
>>> from the default of 256. We run with 8000 xcievers, and that seems  
>>> to solve our problems. I think that if you have a lot of open  
>>> files, this problem happens a lot faster.
>>> -Bryan
>>> On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:
>>>> Hi all -
>>>> I've been running into this error the past few days:
>>>> java.io.IOException: Could not get block locations. Aborting...
>>>> 	at org.apache.hadoop.dfs.DFSClient 
>>>> $DFSOutputStream.processDatanodeError(DFSClient.java:2143)
>>>> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access 
>>>> $1400(DFSClient.java:1735)
>>>> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream 
>>>> $DataStreamer.run(DFSClient.java:1889)
>>>> It seems to be related to trying to write to many files to HDFS.   
>>>> I have a class extending  
>>>> org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output  
>>>> to a few file names, everything works.  However, if I output to  
>>>> thousands of small files, the above error occurs.  I'm having  
>>>> trouble isolating the problem, as the problem doesn't occur in  
>>>> the debugger unfortunately.
>>>> Is this a memory issue, or is there an upper limit to the number  
>>>> of files HDFS can hold?  Any settings to adjust?
>>>> Thanks.

View raw message