hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Prakash <ravihad...@gmail.com>
Subject Re: IMAGE_AND_EDITS Failed
Date Wed, 07 Sep 2011 15:46:09 GMT
Hi Jeremy,

Couple of questions:

1. Which version of Hadoop are you using?
2. If you write something into HDFS, can you subsequently read it?
3. Are you sure your secondarynamenode configuration is correct? It seems
like your SNN is telling your NN to roll the edit log (move the journaling
directory from current to .new), but when it tries to download the image
file, its not finding it.
3. I wish I could say I haven't ever seen that stack trace in the logs. I
was seeing something similar (not the same, quite far from it actually) (
https://issues.apache.org/jira/browse/HDFS-2011 ).

If I were you, and I felt exceptionally brave (mind you I've worked with
only test systems, no production sys-admin guts for me ;-) ) I would
probably do everything I can, to get the secondarynamenode started properly
and make it checkpoint properly.

Me thinks restarting the namenode will most likely result in loss of data.

Hope this helps
Ravi.



On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen <jeremy@skidrow.la> wrote:

>
> I happened to notice this today and being fairly new to administering
> hadoop, I'm not exactly sure how to pull out of this situation without data
> loss.
>
> The checkpoint hasn't happened since Sept 2nd.
>
> -rw-r--r-- 1 hdfs hdfs        8889 Sep  2 14:09 edits
> -rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
> -rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
> -rw-r--r-- 1 hdfs hdfs           8 Sep  2 14:09 fstime
> -rw-r--r-- 1 hdfs hdfs         100 Sep  2 14:09 VERSION
>
> /mnt/data0/dfs/nn/image
> -rw-r--r-- 1 hdfs hdfs    157 Sep  2 14:09 fsimage
>
> I'm also seeing this in the NN logs:
>
> 2011-09-06 16:48:23,738 INFO org.apache.hadoop.hdfs.server.**namenode.FSNamesystem:
> Roll Edit Log from 10.10.10.11
> 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage:
> java.io.IOException: GetImage failed. java.lang.NullPointerException
>        at org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(**
> FSImage.java:219)
>        at org.apache.hadoop.hdfs.server.**namenode.FSImage.**
> getFsImageName(FSImage.java:**1584)
>        at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
> run(GetImageServlet.java:75)
>        at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
> run(GetImageServlet.java:70)
>        at java.security.**AccessController.doPrivileged(**Native Method)
>        at javax.security.auth.Subject.**doAs(Subject.java:396)
>        at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> UserGroupInformation.java:**1115)
>        at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.**
> doGet(GetImageServlet.java:70)
>        at javax.servlet.http.**HttpServlet.service(**HttpServlet.java:707)
>        at javax.servlet.http.**HttpServlet.service(**HttpServlet.java:820)
>        at org.mortbay.jetty.servlet.**ServletHolder.handle(**
> ServletHolder.java:511)
>        at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
> doFilter(ServletHandler.java:**1221)
>        at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.**
> doFilter(HttpServer.java:824)
>        at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
> doFilter(ServletHandler.java:**1212)
>        at org.mortbay.jetty.servlet.**ServletHandler.handle(**
> ServletHandler.java:399)
>        at org.mortbay.jetty.security.**SecurityHandler.handle(**
> SecurityHandler.java:216)
>        at org.mortbay.jetty.servlet.**SessionHandler.handle(**
> SessionHandler.java:182)
>        at org.mortbay.jetty.handler.**ContextHandler.handle(**
> ContextHandler.java:766)
>        at org.mortbay.jetty.webapp.**WebAppContext.handle(**
> WebAppContext.java:450)
>        at org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(**
> ContextHandlerCollection.java:**230)
>        at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
> HandlerWrapper.java:152)
>        at org.mortbay.jetty.Server.**handle(Server.java:326)
>        at org.mortbay.jetty.**HttpConnection.handleRequest(**
> HttpConnection.java:542)
>        at org.mortbay.jetty.**HttpConnection$RequestHandler.**
> headerComplete(HttpConnection.**java:928)
>        at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549)
>        at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
> java:212)
>        at org.mortbay.jetty.**HttpConnection.handle(**
> HttpConnection.java:404)
>
> On the secondary name node:
>
> 2011-09-06 16:51:53,538 ERROR org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode:
> java.io.FileNotFoundException: http://ftrr-nam6000.**
> chestermcgee.com:50070/**getimage?getimage=1<http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1>
>        at sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native
> Method)
>        at sun.reflect.**NativeConstructorAccessorImpl.**newInstance(**
> NativeConstructorAccessorImpl.**java:39)
>        at sun.reflect.**DelegatingConstructorAccessorI**mpl.newInstance(**
> DelegatingConstructorAccessorI**mpl.java:27)
>        at java.lang.reflect.Constructor.**newInstance(Constructor.java:**
> 513)
>        at sun.net.www.protocol.http.**HttpURLConnection$6.run(**
> HttpURLConnection.java:1360)
>        at java.security.**AccessController.doPrivileged(**Native Method)
>        at sun.net.www.protocol.http.**HttpURLConnection.**
> getChainedException(**HttpURLConnection.java:1354)
>        at sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(*
> *HttpURLConnection.java:1008)
>        at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
> getFileClient(TransferFsImage.**java:183)
>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
> run(SecondaryNameNode.java:**348)
>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
> run(SecondaryNameNode.java:**337)
>        at java.security.**AccessController.doPrivileged(**Native Method)
>        at javax.security.auth.Subject.**doAs(Subject.java:396)
>        at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> UserGroupInformation.java:**1115)
>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> downloadCheckpointFiles(**SecondaryNameNode.java:337)
>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> doCheckpoint(**SecondaryNameNode.java:422)
>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> doWork(SecondaryNameNode.java:**313)
>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> run(SecondaryNameNode.java:**276)
>        at java.lang.Thread.run(Thread.**java:619)
> Caused by: java.io.FileNotFoundException: http://ftrr-nam6000.las1.**
> fanops.net:50070/getimage?**getimage=1<http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1>
>        at sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(*
> *HttpURLConnection.java:1303)
>        at sun.net.www.protocol.http.**HttpURLConnection.**getHeaderField(*
> *HttpURLConnection.java:2165)
>        at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
> getFileClient(TransferFsImage.**java:175)
>        ... 10 more
>
> Any help would be very much appreciated.  I'm scared to shut down the NN.
>  I've tried restarting the 2NN.
>
> Thank You
> -jeremy
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message