hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-10729) Improve log message for edit loading failures caused by FS limit checks.
Date Thu, 01 Sep 2016 01:35:21 GMT

     [ https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yongjun Zhang updated HDFS-10729:
---------------------------------
    Labels: supportability  (was: )

> Improve log message for edit loading failures caused by FS limit checks.
> ------------------------------------------------------------------------
>
>                 Key: HDFS-10729
>                 URL: https://issues.apache.org/jira/browse/HDFS-10729
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.7.2
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>              Labels: supportability
>             Fix For: 2.8.0, 3.0.0-alpha2
>
>         Attachments: HDFS-10729.001.patch
>
>
> We encountered a bug where Standby NameNode crashes due to an NPE when loading edits.
> {noformat}
> 2016-08-05 15:06:00,983 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader:
Encountered exception on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3,
mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], permissions=<user>:supergroup:rw-r--r--,
aclEntries=null, clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136,
overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, RpcCallId=417, storagePolicyId=0,
opCode=OP_ADD, txid=4212503758]
> java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:360)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
>         at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> {noformat}
> The NameNode crashes and can not be restarted. After some research, we turned on debug
log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we saw the following exception
which induced NPE:
> {noformat}
> 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedAddFile: exception
when add [path] to the file system
> org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException:
The directory item limit of [path] is exceeded: limit=1048576 items=1049332
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:1900)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:368)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:365)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:188)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:182)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>         at org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:445)
>         at org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:426)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.catchupDuringFailover(EditLogTailer.java:182)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1205)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1762)
>         at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
>         at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
>         at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1635)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1351)
>         at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>         at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}
> The exception is thrown, caught and logged at debug level in {{FSDirectory#unprotectedAddFile}}.
Afterwards, a null is returned, which is used and subsequently caused NPE. HDFS-7567 reported
a similar bug, but it was deemed the NPE could only be thrown if edit is corrupt. However,
here we see an example where NPE could be thrown without corrupt edits.
> Looks like the maximum number of items per directory is exceeded. This is similar to
HDFS-6102 and HDFS-7482, but this happens at loading edits, rather than loading fsimage. 
> A possible workaround is to increas the value of {{dfs.namenode.fs-limits.max-directory-items}}
to 6400000. But I am not sure if it would cause any side effects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message