Please don't cross-post, this belongs to just CDH lists.

On Nov 13, 2012, at 5:18 AM, mg wrote:

Meanwhile we found that the seen_txid files are empty in 4 of 5 replicated namenode directories.

The edits_inprogress_... files are identical in all 5 dirs with the tx id from the one non-empty seen_txid file.

The fsimage files are identical, too.

Otherwise there are differences between every 2 of the 5 dirs as what the edits_ files are concerned.

Is it safe to copy the one non-empty seen_txid file over into the other 4 nn directories?

Cheers,
Martin

On 13.11.2012 12:03, mg wrote:
Hi,

we just upgraded a cluster from CDH 4.0.1 to 4.1.2 on a number of nodes
running on Ubuntu 12.04 (Precise).

We first upgraded Cloudera Manager (now 4.1.0), then ran apt-get
dist-upgrade on all nodes, started CM, checked and updated the
configuration and attempted to start the cluster.

However, the HDFS NameNode fails to start with the exception appended
below.

There is sufficient space on all partitions. We do not bind against
wildcard addresses (at least not yet).

Any ideas? Stacktrace follows.

Cheers,
Martin

FATAL    org.apache.hadoop.hdfs.server.namenode.NameNode
Exception in namenode join
java.lang.NumberFormatException: null
    at java.lang.Long.parseLong(Long.java:375)
    at java.lang.Long.valueOf(Long.java:525)
    at
org.apache.hadoop.hdfs.util.PersistentLongFile.readFile(PersistentLongFile.java:93)

    at
org.apache.hadoop.hdfs.server.namenode.NNStorage.readTransactionIdFile(NNStorage.java:425)

    at
org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.inspectDirectory(FSImageTransactionalStorageInspector.java:71)

    at
org.apache.hadoop.hdfs.server.namenode.NNStorage.inspectStorageDirs(NNStorage.java:1039)

    at
org.apache.hadoop.hdfs.server.namenode.NNStorage.readAndInspectDirs(NNStorage.java:1093)

    at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:598)

    at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)

    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)

    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)

    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)

    at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)

    at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)

    at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608)
    at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589)
    at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)

    at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/