hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hansen <jer...@skidrow.la>
Subject IMAGE_AND_EDITS Failed
Date Wed, 07 Sep 2011 00:26:15 GMT

I happened to notice this today and being fairly new to administering 
hadoop, I'm not exactly sure how to pull out of this situation without 
data loss.

The checkpoint hasn't happened since Sept 2nd.

-rw-r--r-- 1 hdfs hdfs        8889 Sep  2 14:09 edits
-rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
-rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
-rw-r--r-- 1 hdfs hdfs           8 Sep  2 14:09 fstime
-rw-r--r-- 1 hdfs hdfs         100 Sep  2 14:09 VERSION

/mnt/data0/dfs/nn/image
-rw-r--r-- 1 hdfs hdfs    157 Sep  2 14:09 fsimage

I'm also seeing this in the NN logs:

2011-09-06 16:48:23,738 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit
Log from 10.10.10.11
2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed.
java.lang.NullPointerException
         at org.apache.hadoop.hdfs.server.namenode.FSImage.getImageFile(FSImage.java:219)
         at org.apache.hadoop.hdfs.server.namenode.FSImage.getFsImageName(FSImage.java:1584)
         at org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:75)
         at org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:70)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:396)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
         at org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:70)
         at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
         at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
         at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
         at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
         at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:824)
         at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
         at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
         at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
         at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
         at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
         at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
         at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
         at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
         at org.mortbay.jetty.Server.handle(Server.java:326)
         at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
         at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
         at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
         at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)

On the secondary name node:

2011-09-06 16:51:53,538 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.FileNotFoundException:
http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1
         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
         at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1360)
         at java.security.AccessController.doPrivileged(Native Method)
         at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1354)
         at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1008)
         at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:183)
         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:348)
         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:337)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:396)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:337)
         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:422)
         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:313)
         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:276)
         at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.FileNotFoundException: http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1
         at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1303)
         at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2165)
         at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:175)
         ... 10 more

Any help would be very much appreciated.  I'm scared to shut down the NN.  I've tried restarting
the 2NN.

Thank You
-jeremy

Mime
View raw message