hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Prakash <ravihad...@gmail.com>
Subject Re: IMAGE_AND_EDITS Failed
Date Wed, 07 Sep 2011 16:45:30 GMT
If your HDFS is still working, the fsimage file won't be getting updated but
the edits file still should. That's why I asked question 2.

On Wed, Sep 7, 2011 at 11:39 AM, Jeremy Hansen <jeremy@skidrow.la> wrote:

> The problem is that fsimage and edits are no longer being updated, so…if I
> restart, how could it replay those?
>
> -jeremy
>
>
> On Sep 7, 2011, at 8:48 AM, Ravi Prakash wrote:
>
> > Actually I take that back. Restarting the NN might not result in loss of
> > data. It will probably just take longer to start up because it would read
> > the fsimage, then apply the fsedits (rather than the SNN doing it).
> >
> > On Wed, Sep 7, 2011 at 10:46 AM, Ravi Prakash <ravihadoop@gmail.com>
> wrote:
> >
> >> Hi Jeremy,
> >>
> >> Couple of questions:
> >>
> >> 1. Which version of Hadoop are you using?
> >> 2. If you write something into HDFS, can you subsequently read it?
> >> 3. Are you sure your secondarynamenode configuration is correct? It
> seems
> >> like your SNN is telling your NN to roll the edit log (move the
> journaling
> >> directory from current to .new), but when it tries to download the image
> >> file, its not finding it.
> >> 3. I wish I could say I haven't ever seen that stack trace in the logs.
> I
> >> was seeing something similar (not the same, quite far from it actually)
> (
> >> https://issues.apache.org/jira/browse/HDFS-2011 ).
> >>
> >> If I were you, and I felt exceptionally brave (mind you I've worked with
> >> only test systems, no production sys-admin guts for me ;-) ) I would
> >> probably do everything I can, to get the secondarynamenode started
> properly
> >> and make it checkpoint properly.
> >>
> >> Me thinks restarting the namenode will most likely result in loss of
> data.
> >>
> >> Hope this helps
> >> Ravi.
> >>
> >>
> >>
> >>
> >> On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen <jeremy@skidrow.la>
> wrote:
> >>
> >>>
> >>> I happened to notice this today and being fairly new to administering
> >>> hadoop, I'm not exactly sure how to pull out of this situation without
> data
> >>> loss.
> >>>
> >>> The checkpoint hasn't happened since Sept 2nd.
> >>>
> >>> -rw-r--r-- 1 hdfs hdfs        8889 Sep  2 14:09 edits
> >>> -rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
> >>> -rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
> >>> -rw-r--r-- 1 hdfs hdfs           8 Sep  2 14:09 fstime
> >>> -rw-r--r-- 1 hdfs hdfs         100 Sep  2 14:09 VERSION
> >>>
> >>> /mnt/data0/dfs/nn/image
> >>> -rw-r--r-- 1 hdfs hdfs    157 Sep  2 14:09 fsimage
> >>>
> >>> I'm also seeing this in the NN logs:
> >>>
> >>> 2011-09-06 16:48:23,738 INFO
> org.apache.hadoop.hdfs.server.**namenode.FSNamesystem:
> >>> Roll Edit Log from 10.10.10.11
> >>> 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage:
> >>> java.io.IOException: GetImage failed. java.lang.NullPointerException
> >>>       at
> org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(*
> >>> *FSImage.java:219)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.FSImage.**
> >>> getFsImageName(FSImage.java:**1584)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
> >>> run(GetImageServlet.java:75)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
> >>> run(GetImageServlet.java:70)
> >>>       at java.security.**AccessController.doPrivileged(**Native Method)
> >>>       at javax.security.auth.Subject.**doAs(Subject.java:396)
> >>>       at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> >>> UserGroupInformation.java:**1115)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.**
> >>> doGet(GetImageServlet.java:70)
> >>>       at javax.servlet.http.**HttpServlet.service(**
> >>> HttpServlet.java:707)
> >>>       at javax.servlet.http.**HttpServlet.service(**
> >>> HttpServlet.java:820)
> >>>       at org.mortbay.jetty.servlet.**ServletHolder.handle(**
> >>> ServletHolder.java:511)
> >>>       at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
> >>> doFilter(ServletHandler.java:**1221)
> >>>       at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.**
> >>> doFilter(HttpServer.java:824)
> >>>       at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
> >>> doFilter(ServletHandler.java:**1212)
> >>>       at org.mortbay.jetty.servlet.**ServletHandler.handle(**
> >>> ServletHandler.java:399)
> >>>       at org.mortbay.jetty.security.**SecurityHandler.handle(**
> >>> SecurityHandler.java:216)
> >>>       at org.mortbay.jetty.servlet.**SessionHandler.handle(**
> >>> SessionHandler.java:182)
> >>>       at org.mortbay.jetty.handler.**ContextHandler.handle(**
> >>> ContextHandler.java:766)
> >>>       at org.mortbay.jetty.webapp.**WebAppContext.handle(**
> >>> WebAppContext.java:450)
> >>>       at
> org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(*
> >>> *ContextHandlerCollection.java:**230)
> >>>       at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
> >>> HandlerWrapper.java:152)
> >>>       at org.mortbay.jetty.Server.**handle(Server.java:326)
> >>>       at org.mortbay.jetty.**HttpConnection.handleRequest(**
> >>> HttpConnection.java:542)
> >>>       at org.mortbay.jetty.**HttpConnection$RequestHandler.**
> >>> headerComplete(HttpConnection.**java:928)
> >>>       at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549)
> >>>       at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
> >>> java:212)
> >>>       at org.mortbay.jetty.**HttpConnection.handle(**
> >>> HttpConnection.java:404)
> >>>
> >>> On the secondary name node:
> >>>
> >>> 2011-09-06 16:51:53,538 ERROR
> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode:
> >>> java.io.FileNotFoundException: http://ftrr-nam6000.**
> >>> chestermcgee.com:50070/**getimage?getimage=1<
> http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1>
> >>>       at
> sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native
> >>> Method)
> >>>       at sun.reflect.**NativeConstructorAccessorImpl.**newInstance(**
> >>> NativeConstructorAccessorImpl.**java:39)
> >>>       at
> sun.reflect.**DelegatingConstructorAccessorI**mpl.newInstance(*
> >>> *DelegatingConstructorAccessorI**mpl.java:27)
> >>>       at
> java.lang.reflect.Constructor.**newInstance(Constructor.java:**
> >>> 513)
> >>>       at sun.net.www.protocol.http.**HttpURLConnection$6.run(**
> >>> HttpURLConnection.java:1360)
> >>>       at java.security.**AccessController.doPrivileged(**Native Method)
> >>>       at sun.net.www.protocol.http.**HttpURLConnection.**
> >>> getChainedException(**HttpURLConnection.java:1354)
> >>>       at
> sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
> >>> **HttpURLConnection.java:1008)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
> >>> getFileClient(TransferFsImage.**java:183)
> >>>       at
> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
> >>> run(SecondaryNameNode.java:**348)
> >>>       at
> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
> >>> run(SecondaryNameNode.java:**337)
> >>>       at java.security.**AccessController.doPrivileged(**Native Method)
> >>>       at javax.security.auth.Subject.**doAs(Subject.java:396)
> >>>       at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> >>> UserGroupInformation.java:**1115)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>> downloadCheckpointFiles(**SecondaryNameNode.java:337)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>> doCheckpoint(**SecondaryNameNode.java:422)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>> doWork(SecondaryNameNode.java:**313)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>> run(SecondaryNameNode.java:**276)
> >>>       at java.lang.Thread.run(Thread.**java:619)
> >>> Caused by: java.io.FileNotFoundException: http://ftrr-nam6000.las1.**
> >>> fanops.net:50070/getimage?**getimage=1<
> http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1>
> >>>       at
> sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
> >>> **HttpURLConnection.java:1303)
> >>>       at
> sun.net.www.protocol.http.**HttpURLConnection.**getHeaderField(
> >>> **HttpURLConnection.java:2165)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
> >>> getFileClient(TransferFsImage.**java:175)
> >>>       ... 10 more
> >>>
> >>> Any help would be very much appreciated.  I'm scared to shut down the
> NN.
> >>> I've tried restarting the 2NN.
> >>>
> >>> Thank You
> >>> -jeremy
> >>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message