hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Gupta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5140) Too many safemode monitor threads being created in the standby namenode
Date Wed, 28 Aug 2013 18:04:52 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752639#comment-13752639
] 

Arpit Gupta commented on HDFS-5140:
-----------------------------------

Here is the stack trace from the standby namenode

{code}
2013-08-28 08:58:45,519 INFO  hdfs.StateChange (FSNamesystem.java:reportStatus(4677)) - STATE*
Safe mode extension entered.
The reported blocks 833 has reached the threshold 1.0000 of total blocks 833. The number of
live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically
in 29 seconds.
2013-08-28 08:58:45,524 ERROR namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(203))
- Encountered exception on operation CloseOp [length=0, inodeId=0, path=/user/hrt_qa/ha-loadgenerator/100-threads/dir3/dir2/dir5/dir4/dir2/dir1/hostname63,
replication=3, mtime=1377680236411, atime=1377680236320, blockSize=134217728, blocks=[blk_1073940431_205511],
permissions=hrt_qa:hrt_qa:rw-r--r--, clientName=, clientMachine=, opCode=OP_CLOSE, txid=1141116]
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:640)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.checkMode(FSNamesystem.java:4521)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.incrementSafeBlockCount(FSNamesystem.java:4568)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.access$1900(FSNamesystem.java:4275)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.incrementSafeBlockCount(FSNamesystem.java:4854)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.completeBlock(BlockManager.java:596)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.completeBlock(BlockManager.java:608)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:621)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:696)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:372)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:198)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:111)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
2013-08-28 08:58:45,597 FATAL ha.EditLogTailer (EditLogTailer.java:doWork(328)) - Unknown
error encountered while tailing edits. Shutting down standby NN.
java.io.IOException: Failed to apply edit log operation CloseOp [length=0, inodeId=0, path=/user/hrt_qa/ha-loadgenerator/100-threads/dir3/dir2/dir5/dir4/dir2/dir1/hostname63,
replication=3, mtime=1377680236411, atime=1377680236320, blockSize=134217728, blocks=[blk_1073940431_205511],
permissions=hrt_qa:hrt_qa:rw-r--r--, clientName=, clientMachine=, opCode=OP_CLOSE, txid=1141116]:
error unable to create new native thread
        at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:204)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:111)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
2013-08-28 08:58:45,636 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with
status 1
{code}
                
> Too many safemode monitor threads being created in the standby namenode
> -----------------------------------------------------------------------
>
>                 Key: HDFS-5140
>                 URL: https://issues.apache.org/jira/browse/HDFS-5140
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.1.0-beta
>            Reporter: Arpit Gupta
>            Assignee: Jing Zhao
>            Priority: Blocker
>
> While running namenode load generator with 100 threads for 10 mins namenode was being
failed over ever 2 mins.
> The standby namenode shut itself down as it ran out of memory and was not able to create
another thread.
> When we searched for 'Safe mode extension entered' in the standby log it was present
55000+ times

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message