Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D2C269306 for ; Fri, 12 Dec 2014 06:21:13 +0000 (UTC) Received: (qmail 36415 invoked by uid 500); 12 Dec 2014 06:21:13 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 36292 invoked by uid 500); 12 Dec 2014 06:21:13 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 36040 invoked by uid 99); 12 Dec 2014 06:21:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Dec 2014 06:21:13 +0000 Date: Fri, 12 Dec 2014 06:21:13 +0000 (UTC) From: "zhaoyunjiong (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-7470) SecondaryNameNode need twice memory when calling reloadFromImageFile MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-7470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-7470: ------------------------------- Attachment: secondaryNameNode.jstack.txt Thanks Chris Nauroth for your time. Upload a stack trace file for SecondaryNameNode. Correct me if I'm wrong, from stack trace, I think there won't have two threads hold FSNamesystem.writeLock. And SecondaryNameNode didn't start service like BlockManager and CacheManager. For the edit log, SecondaryNameNode won't open it for write. I'll check again whether I missed some risk or try to find out a more safer solution later. > SecondaryNameNode need twice memory when calling reloadFromImageFile > -------------------------------------------------------------------- > > Key: HDFS-7470 > URL: https://issues.apache.org/jira/browse/HDFS-7470 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: zhaoyunjiong > Assignee: zhaoyunjiong > Attachments: HDFS-7470.1.patch, HDFS-7470.patch, secondaryNameNode.jstack.txt > > > histo information at 2014-12-02 01:19 > {quote} > num #instances #bytes class name > ---------------------------------------------- > 1: 186449630 19326123016 [Ljava.lang.Object; > 2: 157366649 15107198304 org.apache.hadoop.hdfs.server.namenode.INodeFile > 3: 183409030 11738177920 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo > 4: 157358401 5244264024 [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo; > 5: 3 3489661000 [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement; > 6: 29253275 1872719664 [B > 7: 3230821 284312248 org.apache.hadoop.hdfs.server.namenode.INodeDirectory > 8: 2756284 110251360 java.util.ArrayList > 9: 469158 22519584 org.apache.hadoop.fs.permission.AclEntry > 10: 847 17133032 [Ljava.util.HashMap$Entry; > 11: 188471 17059632 [C > 12: 314614 10067656 [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature; > 13: 234579 9383160 com.google.common.collect.RegularImmutableList > 14: 49584 6850280 > 15: 49584 6356704 > 16: 187270 5992640 java.lang.String > 17: 234579 5629896 org.apache.hadoop.hdfs.server.namenode.AclFeature > {quote} > histo information at 2014-12-02 01:32 > {quote} > num #instances #bytes class name > ---------------------------------------------- > 1: 355838051 35566651032 [Ljava.lang.Object; > 2: 302272758 29018184768 org.apache.hadoop.hdfs.server.namenode.INodeFile > 3: 352500723 22560046272 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo > 4: 302264510 10075087952 [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo; > 5: 177120233 9374983920 [B > 6: 3 3489661000 [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement; > 7: 6191688 544868544 org.apache.hadoop.hdfs.server.namenode.INodeDirectory > 8: 2799256 111970240 java.util.ArrayList > 9: 890728 42754944 org.apache.hadoop.fs.permission.AclEntry > 10: 330986 29974408 [C > 11: 596871 19099880 [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature; > 12: 445364 17814560 com.google.common.collect.RegularImmutableList > 13: 844 17132816 [Ljava.util.HashMap$Entry; > 14: 445364 10688736 org.apache.hadoop.hdfs.server.namenode.AclFeature > 15: 329789 10553248 java.lang.String > 16: 91741 8807136 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction > 17: 49584 6850280 > {quote} > And the stack trace shows it was doing reloadFromImageFile: > {quote} > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getInode(FSDirectory.java:2426) > at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:160) > at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243) > at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168) > at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:121) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:902) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:888) > at org.apache.hadoop.hdfs.server.namenode.FSImage.reloadFromImageFile(FSImage.java:562) > at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:1048) > at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:536) > at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:388) > at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:354) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1630) > at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413) > at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:350) > at java.lang.Thread.run(Thread.java:745) > {quote} > So before doing reloadFromImageFile, I think we need release old namesystem to prevent SecondaryNameNode OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)