hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhaoyunjiong (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-7470) SecondaryNameNode need twice memory when calling reloadFromImageFile
Date Fri, 12 Dec 2014 06:21:13 GMT

     [ https://issues.apache.org/jira/browse/HDFS-7470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

zhaoyunjiong updated HDFS-7470:
-------------------------------
    Attachment: secondaryNameNode.jstack.txt

Thanks Chris Nauroth for your time.
Upload a stack trace file for SecondaryNameNode.

Correct me if I'm wrong, from stack trace, I think there won't have two threads hold FSNamesystem.writeLock.
And SecondaryNameNode didn't start service like BlockManager and CacheManager.
For the edit log, SecondaryNameNode won't open it for write.

I'll check again whether I missed some risk or try to find out a more safer solution later.

> SecondaryNameNode need twice memory when calling reloadFromImageFile
> --------------------------------------------------------------------
>
>                 Key: HDFS-7470
>                 URL: https://issues.apache.org/jira/browse/HDFS-7470
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: zhaoyunjiong
>            Assignee: zhaoyunjiong
>         Attachments: HDFS-7470.1.patch, HDFS-7470.patch, secondaryNameNode.jstack.txt
>
>
> histo information at 2014-12-02 01:19
> {quote}
>  num     #instances         #bytes  class name
> ----------------------------------------------
>    1:     186449630    19326123016  [Ljava.lang.Object;
>    2:     157366649    15107198304  org.apache.hadoop.hdfs.server.namenode.INodeFile
>    3:     183409030    11738177920  org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo
>    4:     157358401     5244264024  [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
>    5:             3     3489661000  [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement;
>    6:      29253275     1872719664  [B
>    7:       3230821      284312248  org.apache.hadoop.hdfs.server.namenode.INodeDirectory
>    8:       2756284      110251360  java.util.ArrayList
>    9:        469158       22519584  org.apache.hadoop.fs.permission.AclEntry
>   10:           847       17133032  [Ljava.util.HashMap$Entry;
>   11:        188471       17059632  [C
>   12:        314614       10067656  [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature;
>   13:        234579        9383160  com.google.common.collect.RegularImmutableList
>   14:         49584        6850280  <constMethodKlass>
>   15:         49584        6356704  <methodKlass>
>   16:        187270        5992640  java.lang.String
>   17:        234579        5629896  org.apache.hadoop.hdfs.server.namenode.AclFeature
> {quote}
> histo information at 2014-12-02 01:32
> {quote}
>  num     #instances         #bytes  class name
> ----------------------------------------------
>    1:     355838051    35566651032  [Ljava.lang.Object;
>    2:     302272758    29018184768  org.apache.hadoop.hdfs.server.namenode.INodeFile
>    3:     352500723    22560046272  org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo
>    4:     302264510    10075087952  [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
>    5:     177120233     9374983920  [B
>    6:             3     3489661000  [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement;
>    7:       6191688      544868544  org.apache.hadoop.hdfs.server.namenode.INodeDirectory
>    8:       2799256      111970240  java.util.ArrayList
>    9:        890728       42754944  org.apache.hadoop.fs.permission.AclEntry
>   10:        330986       29974408  [C
>   11:        596871       19099880  [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature;
>   12:        445364       17814560  com.google.common.collect.RegularImmutableList
>   13:           844       17132816  [Ljava.util.HashMap$Entry;
>   14:        445364       10688736  org.apache.hadoop.hdfs.server.namenode.AclFeature
>   15:        329789       10553248  java.lang.String
>   16:         91741        8807136  org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction
>   17:         49584        6850280  <constMethodKlass>
> {quote}
> And the stack trace shows it was doing reloadFromImageFile:
> {quote}
> 	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getInode(FSDirectory.java:2426)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:160)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:121)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:902)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:888)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.reloadFromImageFile(FSImage.java:562)
> 	at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:1048)
> 	at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:536)
> 	at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:388)
> 	at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:354)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:356)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1630)
> 	at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413)
> 	at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:350)
> 	at java.lang.Thread.run(Thread.java:745)
> {quote}
> So before doing reloadFromImageFile, I think we need release old namesystem to prevent
SecondaryNameNode OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message