hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lohit Vijayarenu (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-3865) SecondaryNameNode runs out of memory
Date Wed, 30 Jul 2008 18:49:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618465#action_12618465
] 

lohit edited comment on HADOOP-3865 at 7/30/08 11:48 AM:
--------------------------------------------------------------------

This was seen while running secondary namenode with heap of 1.5G and an image of 300MB (on
disk). At each checkpoint heap usage kept of increasing for SN, even force GC did not free
the space. After about 4 checkpoints SN crashed with OutOfMemory exception.

{noformat}
2008-07-30 18:41:01,527 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Throwable
Exception in doCheckpoint: 
2008-07-30 18:41:01,528 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.OutOfMemoryError:
GC overhead limit exceeded
        at java.util.HashMap.addEntry(HashMap.java:753)
        at java.util.HashMap.put(HashMap.java:385)
        at org.apache.hadoop.hdfs.server.namenode.BlocksMap.checkBlockInfo(BlocksMap.java:302)
        at org.apache.hadoop.hdfs.server.namenode.BlocksMap.addINode(BlocksMap.java:316)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addToParent(FSDirectory.java:238)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:819)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:571)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:468)
{noformat}

At each checkpoint, I see that FSNamesystem object count incremented and not released. 

{noformat}
[lohit@ ~]$ jmap -histo:live 2057 | grep FSName
254:         1         288  org.apache.hadoop.hdfs.server.namenode.FSNamesystem
550:         1          56  org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics
881:         1          16  org.apache.hadoop.hdfs.server.namenode.FSNamesystem$1
[lohit@ ~]$ jmap -histo:live 2057 | grep FSName
193:         2         576  org.apache.hadoop.hdfs.server.namenode.FSNamesystem
419:         2         112  org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics
879:         1          16  org.apache.hadoop.hdfs.server.namenode.FSNamesystem$1
[lohit@ ~]$ jmap -histo:live 2057 | grep FSName
154:         3         864  org.apache.hadoop.hdfs.server.namenode.FSNamesystem
343:         3         168  org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics
891:         1          16  org.apache.hadoop.hdfs.server.namenode.FSNamesystem$1
[lohit@ ~]$ jmap -histo:live 2057 | grep FSName
122:         4        1152  org.apache.hadoop.hdfs.server.namenode.FSNamesystem
294:         4         224  org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics
860:         1          16  org.apache.hadoop.hdfs.server.namenode.FSNamesystem$1
[lohit@ ~]$ 

{noformat}

      was (Author: lohit):
    This was seen while running secondary namenode with heap of 1.5G and an image of 300MB
(on disk). At each checkpoint heap usage kept of increasing for SN, even force GC did not
free the space. After about 4 checkpoints SN crashed with OutOfMemory exception.

{noformat}
2008-07-30 18:41:01,527 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Throwable
Exception in doCheckpoint: 
2008-07-30 18:41:01,528 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.OutOfMemoryError:
GC overhead limit exceeded
        at java.util.HashMap.addEntry(HashMap.java:753)
        at java.util.HashMap.put(HashMap.java:385)
        at org.apache.hadoop.hdfs.server.namenode.BlocksMap.checkBlockInfo(BlocksMap.java:302)
        at org.apache.hadoop.hdfs.server.namenode.BlocksMap.addINode(BlocksMap.java:316)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addToParent(FSDirectory.java:238)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:819)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:571)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:468)
{noformat}

At each checkpoint, I see that FSNamesystem object count incremented and not released. 
[lohit@ ~]$ jmap -histo:live 2057 | grep FSName
254:         1         288  org.apache.hadoop.hdfs.server.namenode.FSNamesystem
550:         1          56  org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics
881:         1          16  org.apache.hadoop.hdfs.server.namenode.FSNamesystem$1
[lohit@ ~]$ jmap -histo:live 2057 | grep FSName
193:         2         576  org.apache.hadoop.hdfs.server.namenode.FSNamesystem
419:         2         112  org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics
879:         1          16  org.apache.hadoop.hdfs.server.namenode.FSNamesystem$1
[lohit@ ~]$ jmap -histo:live 2057 | grep FSName
154:         3         864  org.apache.hadoop.hdfs.server.namenode.FSNamesystem
343:         3         168  org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics
891:         1          16  org.apache.hadoop.hdfs.server.namenode.FSNamesystem$1
[lohit@ ~]$ jmap -histo:live 2057 | grep FSName
122:         4        1152  org.apache.hadoop.hdfs.server.namenode.FSNamesystem
294:         4         224  org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics
860:         1          16  org.apache.hadoop.hdfs.server.namenode.FSNamesystem$1
[lohit@ ~]$ 
{noformat}

{noformat}
  
> SecondaryNameNode runs out of memory
> ------------------------------------
>
>                 Key: HADOOP-3865
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3865
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>             Fix For: 0.18.0
>
>
> SecondaryNameNode has memory leak. If we leave secondary namenode to run for a while
doing several checkpoints it runs out of heap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message