Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <4DD3AAB6.60002@creditpointe.com>
Date: Wed, 18 May 2011 16:47:10 +0530
From: Vishaal Jatav <vishaal.jatav@creditpointe.com>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
 rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10
MIME-Version: 1.0
To: <hdfs-user@hadoop.apache.org>, Srinivasarao Vundavalli
	<srinivasarao.vundavalli@creditpointe.com>, Manjunath Sindagi
	<manjunath.sindagi@creditpointe.com>
Subject: Null Pointer Exception while re-starting the Hadoop Cluster
Content-Type: multipart/alternative;
	boundary="------------070109000408090407030605"

--------------070109000408090407030605
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit

Hi.

We are using a cluster of 2 computers (1 namenode and 2 secondarynodes) 
to store a large number of text files in the HDFS. The process had been 
running for atleast a couple of weeks when suddenly due to some power 
failure, the server got reset. So, in effect, the HDFS didn't stop 
cleanly. When I tried to restart the cluster, I got a Null Pointer 
Exception, with the following stack trace (from the logs).

2011-05-18 06:57:39,313 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
Initializing RPC Metrics with hostName=NameNode, port=YYYYY
2011-05-18 06:57:39,321 INFO 
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: 
master/172.XXX.XXX.XXX:YYYYY
2011-05-18 06:57:39,326 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=NameNode, sessionId=null
2011-05-18 06:57:39,329 INFO 
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: 
Initializing NameNodeMeterics using context 
object:org.apache.hadoop.metrics.spi.NullContext
2011-05-18 06:57:39,444 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=vishaal,vishaal
2011-05-18 06:57:39,444 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-05-18 06:57:39,444 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
isPermissionEnabled=true
2011-05-18 06:57:39,459 INFO 
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: 
Initializing FSNamesystemMetrics using context 
object:org.apache.hadoop.metrics.spi.NullContext
2011-05-18 06:57:39,461 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered 
FSNamesystemStatusMBean
2011-05-18 06:57:39,521 INFO 
org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1
2011-05-18 06:57:39,531 INFO 
org.apache.hadoop.hdfs.server.common.Storage: Number of files under 
construction = 0
2011-05-18 06:57:39,531 INFO 
org.apache.hadoop.hdfs.server.common.Storage: Image file of size 97 
loaded in 0 seconds.
2011-05-18 06:57:39,532 INFO 
org.apache.hadoop.hdfs.server.common.Storage: Edits file 
/home/vishaal/hadoop-0.20.2/tmp/dfs/name/current/edits of size 0 edits # 
0 loaded in 0 seconds.
2011-05-18 06:57:39,535 ERROR 
org.apache.hadoop.hdfs.server.namenode.NameNode: 
java.lang.NullPointerException
         at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1320)
         at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1309)
         at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776)
         at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:997)
         at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
         at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
         at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
         at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
         at 
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
         at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
         at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

2011-05-18 06:57:39,537 INFO 
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at 172.XXX.XXX.XXX
************************************************************/

Though this was just an experiment to test the reliability of the HDFS 
storage, I would love to get it running again. This is, of course, 
hoping that the data could be recovered (if it is corrupted). A couple 
of more questions:

    * Is this a common problem? Is there any available patch? (Although
      I couldn't get after a lot of Googling).
    * If the servers are prone to power failures, is it a good choice to
      continue with HDFS for storage of data?
    * If this occurs, does it mean that all the data is corrupt? Does it
      mean not all but some data is corrupt? Can the corrupted data be
      recovered?

Would appreciate a prompt reply as this was an attempt to prove the 
concept of using distributed file system to store large amount of text 
as opposed to a relational database. (I hope you understand that I am on 
the line of fire).

Thanks in advance.
Vishaal Jatav.
(vishaal[dot]iitb04[at]gmail[dot]com)

--------------070109000408090407030605
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body text="#000000" bgcolor="#ffffff">
    Hi.<br>
    <br>
    We are using a cluster of 2 computers (1 namenode and 2
    secondarynodes) to store a large number of text files in the HDFS.
    The process had been running for atleast a couple of weeks when
    suddenly due to some power failure, the server got reset. So, in
    effect, the HDFS didn't stop cleanly. When I tried to restart the
    cluster, I got a Null Pointer Exception, with the following stack
    trace (from the logs).<br>
    <br>
    2011-05-18 06:57:39,313 INFO
    org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics
    with hostName=NameNode, port=YYYYY<br>
    2011-05-18 06:57:39,321 INFO
    org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
    master/172.XXX.XXX.XXX:YYYYY<br>
    2011-05-18 06:57:39,326 INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
    with processName=NameNode, sessionId=null<br>
    2011-05-18 06:57:39,329 INFO
    org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
    Initializing NameNodeMeterics using context
    object:org.apache.hadoop.metrics.spi.NullContext<br>
    2011-05-18 06:57:39,444 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
    fsOwner=vishaal,vishaal<br>
    2011-05-18 06:57:39,444 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
    supergroup=supergroup<br>
    2011-05-18 06:57:39,444 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
    isPermissionEnabled=true<br>
    2011-05-18 06:57:39,459 INFO
    org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
    Initializing FSNamesystemMetrics using context
    object:org.apache.hadoop.metrics.spi.NullContext<br>
    2011-05-18 06:57:39,461 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
    FSNamesystemStatusMBean<br>
    2011-05-18 06:57:39,521 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1<br>
    2011-05-18 06:57:39,531 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Number of files under
    construction = 0<br>
    2011-05-18 06:57:39,531 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Image file of size 97
    loaded in 0 seconds.<br>
    2011-05-18 06:57:39,532 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Edits file
    /home/vishaal/hadoop-0.20.2/tmp/dfs/name/current/edits of size 0
    edits # 0 loaded in 0 seconds.<br>
    2011-05-18 06:57:39,535 ERROR
    org.apache.hadoop.hdfs.server.namenode.NameNode:
    java.lang.NullPointerException<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1320)<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1309)<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776)<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:997)<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.&lt;init&gt;(FSNamesystem.java:292)<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.hdfs.server.namenode.NameNode.&lt;init&gt;(NameNode.java:279)<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
    org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)<br>
    <br>
    2011-05-18 06:57:39,537 INFO
    org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:<br>
    /************************************************************<br>
    SHUTDOWN_MSG: Shutting down NameNode at 172.XXX.XXX.XXX<br>
    ************************************************************/<br>
    <br>
    Though this was just an experiment to test the reliability of the
    HDFS storage, I would love to get it running again. This is, of
    course, hoping that the data could be recovered (if it is
    corrupted). A couple of more questions:<br>
    <ul>
      <li>Is this a common problem? Is there any available patch?
        (Although I couldn't get after a lot of Googling).<br>
      </li>
      <li>If the servers are prone to power failures, is it a good
        choice to continue with HDFS for storage of data?</li>
      <li>If this occurs, does it mean that all the data is corrupt?
        Does it mean not all but some data is corrupt? Can the corrupted
        data be recovered?</li>
    </ul>
    Would appreciate a prompt reply as this was an attempt to prove the
    concept of using distributed file system to store large amount of
    text as opposed to a relational database. (I hope you understand
    that I am on the line of fire).<br>
    <br>
    Thanks in advance.<br>
    Vishaal Jatav.<br>
    (vishaal[dot]iitb04[at]gmail[dot]com)<br>
  </body>
</html>

--------------070109000408090407030605--