hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7470) SecondaryNameNode need twice memory when calling reloadFromImageFile
Date Fri, 12 Dec 2014 00:13:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243424#comment-14243424
] 

Chris Nauroth commented on HDFS-7470:
-------------------------------------

Creating a new {{FSNamesystem}} instance without running the full shutdown sequence on the
old one would create a risk of some dangerous side effects.

* A new namesystem lock instance would get created, and there would be no synchronization
of multiple threads around this.  This could violate mutual exclusion.  Two different threads
could hold the 2 different lock instances, and think that mutual exclusion has been enforced.
* We wouldn't reap background threads inside things like the {{BlockManager}} and {{CacheManager}}.
 Over time, we'd slowly leak threads and eventually hit {{OutOfMemoryError}} conditions.
* I can't remember if we hold an open file descriptor on the edit log when running as SecondaryNameNode.
 If we do, then discarding the old {{FSNamesystem}} without a proper shutdown would leak a
file descriptor.

In general, there are widespread assumptions throughout the codebase that {{FSNamesystem}}
is instantiated exactly once and retained for the entire process lifetime.

> SecondaryNameNode need twice memory when calling reloadFromImageFile
> --------------------------------------------------------------------
>
>                 Key: HDFS-7470
>                 URL: https://issues.apache.org/jira/browse/HDFS-7470
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: zhaoyunjiong
>            Assignee: zhaoyunjiong
>         Attachments: HDFS-7470.1.patch, HDFS-7470.patch
>
>
> histo information at 2014-12-02 01:19
> {quote}
>  num     #instances         #bytes  class name
> ----------------------------------------------
>    1:     186449630    19326123016  [Ljava.lang.Object;
>    2:     157366649    15107198304  org.apache.hadoop.hdfs.server.namenode.INodeFile
>    3:     183409030    11738177920  org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo
>    4:     157358401     5244264024  [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
>    5:             3     3489661000  [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement;
>    6:      29253275     1872719664  [B
>    7:       3230821      284312248  org.apache.hadoop.hdfs.server.namenode.INodeDirectory
>    8:       2756284      110251360  java.util.ArrayList
>    9:        469158       22519584  org.apache.hadoop.fs.permission.AclEntry
>   10:           847       17133032  [Ljava.util.HashMap$Entry;
>   11:        188471       17059632  [C
>   12:        314614       10067656  [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature;
>   13:        234579        9383160  com.google.common.collect.RegularImmutableList
>   14:         49584        6850280  <constMethodKlass>
>   15:         49584        6356704  <methodKlass>
>   16:        187270        5992640  java.lang.String
>   17:        234579        5629896  org.apache.hadoop.hdfs.server.namenode.AclFeature
> {quote}
> histo information at 2014-12-02 01:32
> {quote}
>  num     #instances         #bytes  class name
> ----------------------------------------------
>    1:     355838051    35566651032  [Ljava.lang.Object;
>    2:     302272758    29018184768  org.apache.hadoop.hdfs.server.namenode.INodeFile
>    3:     352500723    22560046272  org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo
>    4:     302264510    10075087952  [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
>    5:     177120233     9374983920  [B
>    6:             3     3489661000  [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement;
>    7:       6191688      544868544  org.apache.hadoop.hdfs.server.namenode.INodeDirectory
>    8:       2799256      111970240  java.util.ArrayList
>    9:        890728       42754944  org.apache.hadoop.fs.permission.AclEntry
>   10:        330986       29974408  [C
>   11:        596871       19099880  [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature;
>   12:        445364       17814560  com.google.common.collect.RegularImmutableList
>   13:           844       17132816  [Ljava.util.HashMap$Entry;
>   14:        445364       10688736  org.apache.hadoop.hdfs.server.namenode.AclFeature
>   15:        329789       10553248  java.lang.String
>   16:         91741        8807136  org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction
>   17:         49584        6850280  <constMethodKlass>
> {quote}
> And the stack trace shows it was doing reloadFromImageFile:
> {quote}
> 	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getInode(FSDirectory.java:2426)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:160)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:121)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:902)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:888)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.reloadFromImageFile(FSImage.java:562)
> 	at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:1048)
> 	at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:536)
> 	at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:388)
> 	at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:354)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:356)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1630)
> 	at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413)
> 	at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:350)
> 	at java.lang.Thread.run(Thread.java:745)
> {quote}
> So before doing reloadFromImageFile, I think we need release old namesystem to prevent
SecondaryNameNode OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message