hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Zeng <wz...@shendusou.com>
Subject FSNamesystem initialization failed saveLeases found path /.../0/wiley_intsci_0_0_0_0_0_3_2_0.yaml but no matching entry in namespace.
Date Fri, 23 Jul 2010 06:46:09 GMT
Hello

    It seems I should send this mail here, got no response from common-user
mail list yesterday.

     I have a hadoop(0.20.2) cluster running with 12 nodes . Today, when I
checking the log of secondaryname node , I found some exceptions like:

*2010-07-22 00:14:28,042 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
0 Total time for transactions(ms): 0Number of trans
actions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
2010-07-22 00:14:28,051 INFO org.apache.hadoop.hdfs.server.common.Storage:
Recovering storage directory /data1/hadoop-0.20.0-cozy/dfs/namesecondary
from fail
ed checkpoint.
2010-07-22 00:14:28,784 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
fsimage size 34054802 bytes.
2010-07-22 00:14:28,803 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
edits size 1256592 bytes.
2010-07-22 00:14:28,878 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
fsOwner=cozy,cozy,adm,dialout,cdrom,plugdev,lpadmin,sambashare,admin
2010-07-22 00:14:28,878 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2010-07-22 00:14:28,878 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2010-07-22 00:14:28,879 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 258478
2010-07-22 00:14:35,019 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 7
2010-07-22 00:14:35,206 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /data1/hadoop-0.20.0-cozy/dfs/namesecondary/current/edits of size
12565
92 edits # 12121 loaded in 0 seconds.
2010-07-22 00:14:35,895 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
doCheckpoint:
2010-07-22 00:14:35,896 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.io.IOException: saveLeases found path /mantis_file/antis_2010071
3/web_page/wiley_intsci/0/wiley_intsci_0_0_0_0_0_3_2_0.yaml but no matching
entry in namespace.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4735)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1029)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1050)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:592)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
        at java.lang.Thread.run(Thread.java:619)*

Then , I closed the whole cluster and restarted namenode. Hoped it can
remove this exception. however, the name node failed when initializing, the
log is:

*10/07/23 00:49:40 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = dev1/192.168.18.3
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/07/23 00:49:40 INFO metrics.RpcMetrics: Initializing RPC Metrics with
hostName=NameNode, port=9000
10/07/23 00:49:40 INFO namenode.NameNode: Namenode up at:
dev1.shendusou.com/192.168.18.3:9000
10/07/23 00:49:40 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=NameNode, sessionId=null
10/07/23 00:49:40 INFO metrics.NameNodeMetrics: Initializing
NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
10/07/23 00:49:40 INFO namenode.FSNamesystem:
fsOwner=cozy,cozy,adm,dialout,cdrom,plugdev,lpadmin,sambashare,admin
10/07/23 00:49:40 INFO namenode.FSNamesystem: supergroup=supergroup
10/07/23 00:49:40 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/07/23 00:49:40 INFO metrics.FSNamesystemMetrics: Initializing
FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
10/07/23 00:49:40 INFO namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
10/07/23 00:49:40 INFO common.Storage: Number of files = 258478
10/07/23 00:49:46 INFO common.Storage: Number of files under construction =
7
10/07/23 00:49:46 INFO common.Storage: Image file of size 34054802 loaded in
5 seconds.
10/07/23 00:49:47 INFO common.Storage: Edits file
/data1/hadoop-0.20.0-cozy/dfs/name/current/edits of size 1256592 edits #
12121 loaded in 0 seconds.
10/07/23 00:50:04 INFO namenode.FSNamesystem: Invalid opcode, reached end of
edit log Number of transactions found 958197
10/07/23 00:50:04 INFO common.Storage: Edits file
/data1/hadoop-0.20.0-cozy/dfs/name/current/edits.new of size 104297870 edits
# 958197 loaded in 16 seconds.
10/07/23 00:50:05 ERROR namenode.FSNamesystem: FSNamesystem initialization
failed.
java.io.IOException: saveLeases found path
/mantis_file/antis_20100713/web_page/wiley_intsci/0/wiley_intsci_0_0_0_0_0_3_2_0.yaml
but no matching entry in namespace.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4735)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1029)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1050)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
10/07/23 00:50:05 INFO ipc.Server: Stopping server on 9000
10/07/23 00:50:05 ERROR namenode.NameNode: java.io.IOException: saveLeases
found path
/mantis_file/antis_20100713/web_page/wiley_intsci/0/wiley_intsci_0_0_0_0_0_3_2_0.yaml
but no matching entry in namespace.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4735)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1029)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1050)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

10/07/23 00:50:05 INFO namenode.NameNode: SHUTDOWN_MSG: *


I found some JiRAs  about it  (HADOOP-3418, HADOOP-3724) ,however, These
issues are reported in 0.18 and 0.19  , seems all are fixed before 0.20.0,
does it come back in 0.20.2?

I tried to recover it from the checkpoint in secondary name node,  the
problem is , checkpoint is empty.

Any one can tell me how to recover the name node with current  fsimage,
edits and edits.new(or others on namenode) ?  lose some data is ok , lose
all is terrible

Is there a better way to recover the cluster besides formatting namenode?

Any idea is greatly appreciated!

-- 
wzeng

Mime
View raw message