hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10729) NameNode crashes when loading edits because max directory items is exceeded
Date Mon, 08 Aug 2016 14:45:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411901#comment-15411901
] 

Kihwal Lee commented on HDFS-10729:
-----------------------------------

Do you know how it got into this state?  If the serving namenode had the same config, this
op should have failed and wouldn't have logged in the first place. Do you suspect a bug in
enforcement similar to HDFS-6099?

> NameNode crashes when loading edits because max directory items is exceeded
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-10729
>                 URL: https://issues.apache.org/jira/browse/HDFS-10729
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.7.2
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Critical
>
> We encountered a bug where Standby NameNode crashes due to an NPE when loading edits.
> {noformat}
> 2016-08-05 15:06:00,983 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader:
Encountered exception on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3,
mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], permissions=<user>:supergroup:rw-r--r--,
aclEntries=null, clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136,
overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, RpcCallId=417, storagePolicyId=0,
opCode=OP_ADD, txid=4212503758]
> java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:360)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
>         at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> {noformat}
> The NameNode crashes and can not be restarted. After some research, we turned on debug
log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we saw the following exception
which induced NPE:
> {noformat}
> 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedAddFile: exception
when add [path] to the file system
> org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException:
The directory item limit of [path] is exceeded: limit=1048576 items=1049332
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:1900)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:368)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:365)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:188)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:182)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>         at org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:445)
>         at org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:426)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.catchupDuringFailover(EditLogTailer.java:182)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1205)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1762)
>         at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
>         at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
>         at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1635)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1351)
>         at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>         at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}
> The exception is thrown, caught and logged at debug level in {{FSDirectory#unprotectedAddFile}}.
Afterwards, a null is returned, which is used and subsequently caused NPE. HDFS-7567 reported
a similar bug, but it was deemed the NPE could only be thrown if edit is corrupt. However,
here we see an example where NPE could be thrown without corrupt edits.
> Looks like the maximum number of items per directory is exceeded. This is similar to
HDFS-6102 and HDFS-7482, but this happens at loading edits, rather than loading fsimage. 
> A possible workaround is to increas the value of {{dfs.namenode.fs-limits.max-directory-items}}
to 6400000. But I am not sure if it would cause any side effects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message