hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <...@yahoo-inc.com>
Subject Re: Name node heap space problem
Date Mon, 28 Jul 2008 19:14:48 GMT
It looks like you have the whole file system flattened in one directory.
Both fsck and ls call the same method on the name-node getListing(), which returns
an array FileStatus for each file in the directory.
I think that fsck works in this case because it does not use rpc and therefore
does not create an additional copy of the array of FileStatus-es, as opposed
to ls, which gets the array and send it back as an rpc reply. The rpc system
serializes the reply, and this where you get the second copy of the array.

You can try to add more memory on the node, or you can also try to break the
directory into smaller directories, say by moving files starting with 'a', 'b', 'c', etc.
into new separate directories.

--Konstantin


Gert Pfeifer wrote:
> There I have:
>    export HADOOP_HEAPSIZE=8000
> ,which should be enough (actually in this case I don't know).
> 
> Running the fsck on the directory it turned out that there are 1785959 
> files in this dir... I have no clue how I can get  the data out of there.
> Can I somehow calculate, how much heap a namenode would need to do an ls 
> on this dir?
> 
> Gert
> 
> 
> Taeho Kang schrieb:
>> Check how much memory is allocated for the JVM running namenode.
>>
>> In a file HADOOP_INSTALL/conf/hadoop-env.sh
>> you should change a line that starts with "export HADOOP_HEAPSIZE=1000"
>>
>> It's set to 1GB by default.
>>
>>
>> On Fri, Jul 25, 2008 at 2:51 AM, Gert Pfeifer 
>> <pfeifer@se.inf.tu-dresden.de>
>> wrote:
>>
>>> Update on this one...
>>>
>>> I put some more memory in the machine running the name node. Now fsck is
>>> running. Unfortunately ls fails with a time-out.
>>>
>>> I identified one directory that causes the trouble. I can run fsck on it
>>> but not ls.
>>>
>>> What could be the problem?
>>>
>>> Gert
>>>
>>> Gert Pfeifer schrieb:
>>>
>>> Hi,
>>>> I am running a Hadoop DFS on a cluster of 5 data nodes with a name node
>>>> and one secondary name node.
>>>>
>>>> I have 1788874 files and directories, 1465394 blocks = 3254268 total.
>>>> Heap Size max is 3.47 GB.
>>>>
>>>> My problem is that I produce many small files. Therefore I have a cron
>>>> job which just runs daily across the new files and copies them into
>>>> bigger files and deletes the small files.
>>>>
>>>> Apart from this program, even a fsck kills the cluster.
>>>>
>>>> The problem is that, as soon as I start this program, the heap space of
>>>> the name node reaches 100 %.
>>>>
>>>> What could be the problem? There are not many small files right now and
>>>> still it doesn't work. I guess we have this problem since the 
>>>> upgrade to
>>>> 0.17.
>>>>
>>>> Here is some additional data about the DFS:
>>>> Capacity         :       2 TB
>>>> DFS Remaining   :       1.19 TB
>>>> DFS Used        :       719.35 GB
>>>> DFS Used%       :       35.16 %
>>>>
>>>> Thanks for hints,
>>>> Gert
>>>>
>>>
>>
> 
> 

Mime
View raw message