hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Prakash <ravihad...@gmail.com>
Subject Re: IMAGE_AND_EDITS Failed
Date Wed, 07 Sep 2011 17:21:36 GMT
Can you hexdump the edits file, write something to HDFS, hexdump again and
then compare the two hexdumps? Are you sure you're looking at the correct
fsedits file? How many storage directories did you have configured?


On Wed, Sep 7, 2011 at 11:57 AM, Jeremy Hansen <jeremy@skidrow.la> wrote:

> Things still work in hdfs but the edits file is not being updated.
> Timestamp is sept 2nd.
>
> -jeremy
>
> On Sep 7, 2011, at 9:45 AM, Ravi Prakash <ravihadoop@gmail.com> wrote:
>
> > If your HDFS is still working, the fsimage file won't be getting updated
> but
> > the edits file still should. That's why I asked question 2.
> >
> > On Wed, Sep 7, 2011 at 11:39 AM, Jeremy Hansen <jeremy@skidrow.la>
> wrote:
> >
> >> The problem is that fsimage and edits are no longer being updated, so…if
> I
> >> restart, how could it replay those?
> >>
> >> -jeremy
> >>
> >>
> >> On Sep 7, 2011, at 8:48 AM, Ravi Prakash wrote:
> >>
> >>> Actually I take that back. Restarting the NN might not result in loss
> of
> >>> data. It will probably just take longer to start up because it would
> read
> >>> the fsimage, then apply the fsedits (rather than the SNN doing it).
> >>>
> >>> On Wed, Sep 7, 2011 at 10:46 AM, Ravi Prakash <ravihadoop@gmail.com>
> >> wrote:
> >>>
> >>>> Hi Jeremy,
> >>>>
> >>>> Couple of questions:
> >>>>
> >>>> 1. Which version of Hadoop are you using?
> >>>> 2. If you write something into HDFS, can you subsequently read it?
> >>>> 3. Are you sure your secondarynamenode configuration is correct? It
> >> seems
> >>>> like your SNN is telling your NN to roll the edit log (move the
> >> journaling
> >>>> directory from current to .new), but when it tries to download the
> image
> >>>> file, its not finding it.
> >>>> 3. I wish I could say I haven't ever seen that stack trace in the
> logs.
> >> I
> >>>> was seeing something similar (not the same, quite far from it
> actually)
> >> (
> >>>> https://issues.apache.org/jira/browse/HDFS-2011 ).
> >>>>
> >>>> If I were you, and I felt exceptionally brave (mind you I've worked
> with
> >>>> only test systems, no production sys-admin guts for me ;-) ) I would
> >>>> probably do everything I can, to get the secondarynamenode started
> >> properly
> >>>> and make it checkpoint properly.
> >>>>
> >>>> Me thinks restarting the namenode will most likely result in loss of
> >> data.
> >>>>
> >>>> Hope this helps
> >>>> Ravi.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen <jeremy@skidrow.la>
> >> wrote:
> >>>>
> >>>>>
> >>>>> I happened to notice this today and being fairly new to administering
> >>>>> hadoop, I'm not exactly sure how to pull out of this situation
> without
> >> data
> >>>>> loss.
> >>>>>
> >>>>> The checkpoint hasn't happened since Sept 2nd.
> >>>>>
> >>>>> -rw-r--r-- 1 hdfs hdfs        8889 Sep  2 14:09 edits
> >>>>> -rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
> >>>>> -rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
> >>>>> -rw-r--r-- 1 hdfs hdfs           8 Sep  2 14:09 fstime
> >>>>> -rw-r--r-- 1 hdfs hdfs         100 Sep  2 14:09 VERSION
> >>>>>
> >>>>> /mnt/data0/dfs/nn/image
> >>>>> -rw-r--r-- 1 hdfs hdfs    157 Sep  2 14:09 fsimage
> >>>>>
> >>>>> I'm also seeing this in the NN logs:
> >>>>>
> >>>>> 2011-09-06 16:48:23,738 INFO
> >> org.apache.hadoop.hdfs.server.**namenode.FSNamesystem:
> >>>>> Roll Edit Log from 10.10.10.11
> >>>>> 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage:
> >>>>> java.io.IOException: GetImage failed. java.lang.NullPointerException
> >>>>>      at
> >> org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(*
> >>>>> *FSImage.java:219)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.FSImage.**
> >>>>> getFsImageName(FSImage.java:**1584)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
> >>>>> run(GetImageServlet.java:75)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
> >>>>> run(GetImageServlet.java:70)
> >>>>>      at java.security.**AccessController.doPrivileged(**Native
> Method)
> >>>>>      at javax.security.auth.Subject.**doAs(Subject.java:396)
> >>>>>      at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> >>>>> UserGroupInformation.java:**1115)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.**
> >>>>> doGet(GetImageServlet.java:70)
> >>>>>      at javax.servlet.http.**HttpServlet.service(**
> >>>>> HttpServlet.java:707)
> >>>>>      at javax.servlet.http.**HttpServlet.service(**
> >>>>> HttpServlet.java:820)
> >>>>>      at org.mortbay.jetty.servlet.**ServletHolder.handle(**
> >>>>> ServletHolder.java:511)
> >>>>>      at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
> >>>>> doFilter(ServletHandler.java:**1221)
> >>>>>      at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.**
> >>>>> doFilter(HttpServer.java:824)
> >>>>>      at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
> >>>>> doFilter(ServletHandler.java:**1212)
> >>>>>      at org.mortbay.jetty.servlet.**ServletHandler.handle(**
> >>>>> ServletHandler.java:399)
> >>>>>      at org.mortbay.jetty.security.**SecurityHandler.handle(**
> >>>>> SecurityHandler.java:216)
> >>>>>      at org.mortbay.jetty.servlet.**SessionHandler.handle(**
> >>>>> SessionHandler.java:182)
> >>>>>      at org.mortbay.jetty.handler.**ContextHandler.handle(**
> >>>>> ContextHandler.java:766)
> >>>>>      at org.mortbay.jetty.webapp.**WebAppContext.handle(**
> >>>>> WebAppContext.java:450)
> >>>>>      at
> >> org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(*
> >>>>> *ContextHandlerCollection.java:**230)
> >>>>>      at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
> >>>>> HandlerWrapper.java:152)
> >>>>>      at org.mortbay.jetty.Server.**handle(Server.java:326)
> >>>>>      at org.mortbay.jetty.**HttpConnection.handleRequest(**
> >>>>> HttpConnection.java:542)
> >>>>>      at org.mortbay.jetty.**HttpConnection$RequestHandler.**
> >>>>> headerComplete(HttpConnection.**java:928)
> >>>>>      at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549)
> >>>>>      at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
> >>>>> java:212)
> >>>>>      at org.mortbay.jetty.**HttpConnection.handle(**
> >>>>> HttpConnection.java:404)
> >>>>>
> >>>>> On the secondary name node:
> >>>>>
> >>>>> 2011-09-06 16:51:53,538 ERROR
> >> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode:
> >>>>> java.io.FileNotFoundException: http://ftrr-nam6000.**
> >>>>> chestermcgee.com:50070/**getimage?getimage=1<
> >> http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1>
> >>>>>      at
> >> sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native
> >>>>> Method)
> >>>>>      at sun.reflect.**NativeConstructorAccessorImpl.**newInstance(**
> >>>>> NativeConstructorAccessorImpl.**java:39)
> >>>>>      at
> >> sun.reflect.**DelegatingConstructorAccessorI**mpl.newInstance(*
> >>>>> *DelegatingConstructorAccessorI**mpl.java:27)
> >>>>>      at
> >> java.lang.reflect.Constructor.**newInstance(Constructor.java:**
> >>>>> 513)
> >>>>>      at sun.net.www.protocol.http.**HttpURLConnection$6.run(**
> >>>>> HttpURLConnection.java:1360)
> >>>>>      at java.security.**AccessController.doPrivileged(**Native
> Method)
> >>>>>      at sun.net.www.protocol.http.**HttpURLConnection.**
> >>>>> getChainedException(**HttpURLConnection.java:1354)
> >>>>>      at
> >> sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
> >>>>> **HttpURLConnection.java:1008)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
> >>>>> getFileClient(TransferFsImage.**java:183)
> >>>>>      at
> >> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
> >>>>> run(SecondaryNameNode.java:**348)
> >>>>>      at
> >> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
> >>>>> run(SecondaryNameNode.java:**337)
> >>>>>      at java.security.**AccessController.doPrivileged(**Native
> Method)
> >>>>>      at javax.security.auth.Subject.**doAs(Subject.java:396)
> >>>>>      at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> >>>>> UserGroupInformation.java:**1115)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>>>> downloadCheckpointFiles(**SecondaryNameNode.java:337)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>>>> doCheckpoint(**SecondaryNameNode.java:422)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>>>> doWork(SecondaryNameNode.java:**313)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>>>> run(SecondaryNameNode.java:**276)
> >>>>>      at java.lang.Thread.run(Thread.**java:619)
> >>>>> Caused by: java.io.FileNotFoundException: http://ftrr-nam6000.las1.
> **
> >>>>> fanops.net:50070/getimage?**getimage=1<
> >> http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1>
> >>>>>      at
> >> sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
> >>>>> **HttpURLConnection.java:1303)
> >>>>>      at
> >> sun.net.www.protocol.http.**HttpURLConnection.**getHeaderField(
> >>>>> **HttpURLConnection.java:2165)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
> >>>>> getFileClient(TransferFsImage.**java:175)
> >>>>>      ... 10 more
> >>>>>
> >>>>> Any help would be very much appreciated.  I'm scared to shut down
the
> >> NN.
> >>>>> I've tried restarting the 2NN.
> >>>>>
> >>>>> Thank You
> >>>>> -jeremy
> >>>>>
> >>>>
> >>>>
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message