hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: HDFS Namenode Heap Size woes
Date Mon, 02 Feb 2009 03:03:29 GMT
Hey Sean,

I use JMX monitoring -- which allows me to trigger GC via jconsole.   
There's decent documentation out there to making it work, but you'd  
have to restart the namenode to do it ... let the list know if you  
can't figure it out.

Brian

On Feb 1, 2009, at 8:59 PM, Sean Knapp wrote:

> Brian,
> Thanks for jumping in as well. Is there a recommended way of manually
> triggering GC?
>
> Thanks,
> Sean
>
> On Sun, Feb 1, 2009 at 6:06 PM, Brian Bockelman  
> <bbockelm@cse.unl.edu>wrote:
>
>> Hey Sean,
>>
>> Dumb question: how much memory is used after a garbage collection  
>> cycle?
>>
>> Look at the graph "jvm.metrics.memHeapUsedM":
>>
>>
>> http://rcf.unl.edu/ganglia/?m=network_report&r=hour&s=descending&c=red&h=hadoop-name&sh=1&hc=4&z=small
>>
>> If you tell the JVM it has 16GB of memory to play with, it will  
>> often use a
>> significant portion of that before it does a thorough GC.  In our  
>> site, it
>> actually only needs ~ 500MB, but sometimes it will hit 1GB before  
>> GC is
>> triggered.  One of the vagaries of Java, eh?
>>
>> Trigger a GC and see how much is actually used.
>>
>> Brian
>>
>>
>> On Feb 1, 2009, at 6:11 PM, Sean Knapp wrote:
>>
>> Jason,
>>> Thanks for the response. By falling out, do you mean a longer time  
>>> since
>>> last contact (100s+), or fully timed out where it is dropped into  
>>> dead
>>> nodes? The former happens fairly often, the latter only under  
>>> serious load
>>> but not in the last day. Also, my namenode is now up to 10GB with  
>>> less
>>> than
>>> 700k files after some additional archiving.
>>>
>>> Thanks,
>>> Sean
>>>
>>> On Sun, Feb 1, 2009 at 4:00 PM, jason hadoop  
>>> <jason.hadoop@gmail.com>
>>> wrote:
>>>
>>> If your datanodes are pausing and falling out of the cluster you  
>>> will get
>>>> a
>>>> large workload for the namenode of blocks to replicate and when the
>>>> paused
>>>> datanode comes back, a large workload of blocks to delete.
>>>> These lists are stored in memory on the namenode.
>>>> The startup messages lead me to wonder if your datanodes are  
>>>> periodically
>>>> pausing or are otherwise dropping in and out of the cluster.
>>>>
>>>> On Sat, Jan 31, 2009 at 2:20 PM, Sean Knapp <sean@ooyala.com>  
>>>> wrote:
>>>>
>>>> I'm running 0.19.0 on a 10 node cluster (8 core, 16GB RAM,  
>>>> 4x1.5TB). The
>>>>> current status of my FS is approximately 1 million files and
>>>>> directories,
>>>>> 950k blocks, and heap size of 7GB (16GB reserved). Average block
>>>>> replication
>>>>> is 3.8. I'm concerned that the heap size is steadily climbing...  
>>>>> a 7GB
>>>>>
>>>> heap
>>>>
>>>>> is substantially higher per file that I have on a similar 0.18.2
>>>>> cluster,
>>>>> which has closer to a 1GB heap.
>>>>> My typical usage model is 1) write a number of small files into  
>>>>> HDFS
>>>>>
>>>> (tens
>>>>
>>>>> or hundreds of thousands at a time), 2) archive those files, 3)  
>>>>> delete
>>>>>
>>>> the
>>>>
>>>>> originals. I've tried dropping the replication factor of the  
>>>>> _index and
>>>>> _masterindex files without much effect on overall heap size.  
>>>>> While I had
>>>>> trash enabled at one point, I've since disabled it and deleted the
>>>>> .Trash
>>>>> folders.
>>>>>
>>>>> On namenode startup, I get a massive number of the following  
>>>>> lines in my
>>>>> log
>>>>> file:
>>>>> 2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange:  
>>>>> BLOCK*
>>>>> NameSystem.processReport: block blk_-2389330910609345428_7332878  
>>>>> on
>>>>> 172.16.129.33:50010 size 798080 does not belong to any file.
>>>>> 2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange:  
>>>>> BLOCK*
>>>>> NameSystem.addToInvalidates: blk_-2389330910609345428 is added to
>>>>> invalidSet
>>>>> of 172.16.129.33:50010
>>>>>
>>>>> I suspect the original files may be left behind and causing the  
>>>>> heap
>>>>> size
>>>>> bloat. Is there any accounting mechanism to determine what is
>>>>>
>>>> contributing
>>>>
>>>>> to my heap size?
>>>>>
>>>>> Thanks,
>>>>> Sean
>>>>>
>>>>>
>>>>
>>


Mime
View raw message