hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "fujie (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-8011) standby nn can't started
Date Mon, 30 Mar 2015 09:49:53 GMT

     [ https://issues.apache.org/jira/browse/HDFS-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

fujie updated HDFS-8011:
------------------------
    Description: 
We have seen crash when starting the standby namenode, with fatal errors. Any solutions, workarouds,
or ideas would be helpful for us.
1. Here is the context: 
	At begining we have 2 namenodes, take A as active and B as standby. For some resons, namenode
A was dead, so namenode B is working as active.
	When we try to restart A after a minute, it can't work. During this time a lot of files were
put to HDFS, and a lot of files were renamed. 
	Nodenode A crashed when "awaiting reported blocks in safemode" each time.
 
2. We can see error log below:
	1)2015-03-30  ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception
on operation CloseOp [length=0, inodeId=0, path=/xxx/_temporary/xxx/part-r-00074.bz2, replication=3,
mtime=1427699913947, atime=1427699081161, blockSize=268435456, blocks=[blk_2103131025_1100889495739],
permissions=dm:dm:rw-r--r--, clientName=, clientMachine=, opCode=OP_CLOSE, txid=7632753612]
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.setGenerationStampAndVerifyReplicas(BlockInfoUnderConstruction.java:247)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.commitBlock(BlockInfoUnderConstruction.java:267)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:639)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:813)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:383)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:209)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:122)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:737)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$0(EditLogTailer.java:302)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
        
   2)2015-03-30  FATAL org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error
encountered while tailing edits. Shutting down standby N
N.
java.io.IOException: Failed to apply edit log operation AddBlockOp [path=/xxx/_temporary/xxx/part-m-00121,
penultimateBlock=blk_2102331803_1100888911441, lastBlock=blk_2102661068_1100889009168, RpcClientId=,
RpcCallId=-2]: error
null
        at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:122)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:737)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$0(EditLogTailer.java:302)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
        


  was:
1.after active nn was dead ,the standby nn turn  active(use zkfc)

2.and then we start  the new  standby nn,ocurr an FAtal。

the standby nn can't work。




> standby nn can't started
> ------------------------
>
>                 Key: HDFS-8011
>                 URL: https://issues.apache.org/jira/browse/HDFS-8011
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.3.0
>         Environment: centeros 6.2  64bit 
>            Reporter: fujie
>
> We have seen crash when starting the standby namenode, with fatal errors. Any solutions,
workarouds, or ideas would be helpful for us.
> 1. Here is the context: 
> 	At begining we have 2 namenodes, take A as active and B as standby. For some resons,
namenode A was dead, so namenode B is working as active.
> 	When we try to restart A after a minute, it can't work. During this time a lot of files
were put to HDFS, and a lot of files were renamed. 
> 	Nodenode A crashed when "awaiting reported blocks in safemode" each time.
>  
> 2. We can see error log below:
> 	1)2015-03-30  ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered
exception on operation CloseOp [length=0, inodeId=0, path=/xxx/_temporary/xxx/part-r-00074.bz2,
replication=3, mtime=1427699913947, atime=1427699081161, blockSize=268435456, blocks=[blk_2103131025_1100889495739],
permissions=dm:dm:rw-r--r--, clientName=, clientMachine=, opCode=OP_CLOSE, txid=7632753612]
> java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.setGenerationStampAndVerifyReplicas(BlockInfoUnderConstruction.java:247)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.commitBlock(BlockInfoUnderConstruction.java:267)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:639)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:813)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:383)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:209)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:122)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:737)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$0(EditLogTailer.java:302)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:356)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
>         at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
>         
>    2)2015-03-30  FATAL org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown
error encountered while tailing edits. Shutting down standby N
> N.
> java.io.IOException: Failed to apply edit log operation AddBlockOp [path=/xxx/_temporary/xxx/part-m-00121,
penultimateBlock=blk_2102331803_1100888911441, lastBlock=blk_2102661068_1100889009168, RpcClientId=,
RpcCallId=-2]: error
> null
>         at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:122)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:737)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$0(EditLogTailer.java:302)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:356)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
>         at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413)
>         at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
>         



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message