hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: java.io.IOException: Could not get block locations. Aborting...
Date Tue, 10 Feb 2009 01:50:54 GMT
The other issue you may run into, with many files in your HDFS is that you
may end up with more than a few 100k worth of blocks on each of your
datanodes. At present this can lead to instability due to the way the
periodic block reports to the namenode are handled. The more blocks per
datanode, the larger the risk of congestion collapse in your hdfs.

On Mon, Feb 9, 2009 at 5:11 PM, Bryan Duxbury <bryan@rapleaf.com> wrote:

> Correct.
>
> +1 to Jason's more unix file handles suggestion. That's a must-have.
>
> -Bryan
>
>
> On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote:
>
>  This would be an addition to the hadoop-site.xml file, to up
>> dfs.datanode.max.xcievers?
>>
>> Thanks.
>>
>>
>>
>> On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:
>>
>>  Small files are bad for hadoop. You should avoid keeping a lot of small
>>> files if possible.
>>>
>>> That said, that error is something I've seen a lot. It usually happens
>>> when the number of xcievers hasn't been adjusted upwards from the default of
>>> 256. We run with 8000 xcievers, and that seems to solve our problems. I
>>> think that if you have a lot of open files, this problem happens a lot
>>> faster.
>>>
>>> -Bryan
>>>
>>> On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:
>>>
>>>  Hi all -
>>>>
>>>> I've been running into this error the past few days:
>>>> java.io.IOException: Could not get block locations. Aborting...
>>>>        at
>>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
>>>>        at
>>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
>>>>        at
>>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
>>>>
>>>> It seems to be related to trying to write to many files to HDFS.  I have
>>>> a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if
I
>>>> output to a few file names, everything works.  However, if I output to
>>>> thousands of small files, the above error occurs.  I'm having trouble
>>>> isolating the problem, as the problem doesn't occur in the debugger
>>>> unfortunately.
>>>>
>>>> Is this a memory issue, or is there an upper limit to the number of
>>>> files HDFS can hold?  Any settings to adjust?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message