hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Prakash <ravihad...@gmail.com>
Subject Re: IMAGE_AND_EDITS Failed
Date Wed, 07 Sep 2011 15:48:42 GMT
Actually I take that back. Restarting the NN might not result in loss of
data. It will probably just take longer to start up because it would read
the fsimage, then apply the fsedits (rather than the SNN doing it).

On Wed, Sep 7, 2011 at 10:46 AM, Ravi Prakash <ravihadoop@gmail.com> wrote:

> Hi Jeremy,
>
> Couple of questions:
>
> 1. Which version of Hadoop are you using?
> 2. If you write something into HDFS, can you subsequently read it?
> 3. Are you sure your secondarynamenode configuration is correct? It seems
> like your SNN is telling your NN to roll the edit log (move the journaling
> directory from current to .new), but when it tries to download the image
> file, its not finding it.
> 3. I wish I could say I haven't ever seen that stack trace in the logs. I
> was seeing something similar (not the same, quite far from it actually) (
> https://issues.apache.org/jira/browse/HDFS-2011 ).
>
> If I were you, and I felt exceptionally brave (mind you I've worked with
> only test systems, no production sys-admin guts for me ;-) ) I would
> probably do everything I can, to get the secondarynamenode started properly
> and make it checkpoint properly.
>
> Me thinks restarting the namenode will most likely result in loss of data.
>
> Hope this helps
> Ravi.
>
>
>
>
> On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen <jeremy@skidrow.la> wrote:
>
>>
>> I happened to notice this today and being fairly new to administering
>> hadoop, I'm not exactly sure how to pull out of this situation without data
>> loss.
>>
>> The checkpoint hasn't happened since Sept 2nd.
>>
>> -rw-r--r-- 1 hdfs hdfs        8889 Sep  2 14:09 edits
>> -rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
>> -rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
>> -rw-r--r-- 1 hdfs hdfs           8 Sep  2 14:09 fstime
>> -rw-r--r-- 1 hdfs hdfs         100 Sep  2 14:09 VERSION
>>
>> /mnt/data0/dfs/nn/image
>> -rw-r--r-- 1 hdfs hdfs    157 Sep  2 14:09 fsimage
>>
>> I'm also seeing this in the NN logs:
>>
>> 2011-09-06 16:48:23,738 INFO org.apache.hadoop.hdfs.server.**namenode.FSNamesystem:
>> Roll Edit Log from 10.10.10.11
>> 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage:
>> java.io.IOException: GetImage failed. java.lang.NullPointerException
>>        at org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(*
>> *FSImage.java:219)
>>        at org.apache.hadoop.hdfs.server.**namenode.FSImage.**
>> getFsImageName(FSImage.java:**1584)
>>        at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
>> run(GetImageServlet.java:75)
>>        at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
>> run(GetImageServlet.java:70)
>>        at java.security.**AccessController.doPrivileged(**Native Method)
>>        at javax.security.auth.Subject.**doAs(Subject.java:396)
>>        at org.apache.hadoop.security.**UserGroupInformation.doAs(**
>> UserGroupInformation.java:**1115)
>>        at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.**
>> doGet(GetImageServlet.java:70)
>>        at javax.servlet.http.**HttpServlet.service(**
>> HttpServlet.java:707)
>>        at javax.servlet.http.**HttpServlet.service(**
>> HttpServlet.java:820)
>>        at org.mortbay.jetty.servlet.**ServletHolder.handle(**
>> ServletHolder.java:511)
>>        at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
>> doFilter(ServletHandler.java:**1221)
>>        at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.**
>> doFilter(HttpServer.java:824)
>>        at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
>> doFilter(ServletHandler.java:**1212)
>>        at org.mortbay.jetty.servlet.**ServletHandler.handle(**
>> ServletHandler.java:399)
>>        at org.mortbay.jetty.security.**SecurityHandler.handle(**
>> SecurityHandler.java:216)
>>        at org.mortbay.jetty.servlet.**SessionHandler.handle(**
>> SessionHandler.java:182)
>>        at org.mortbay.jetty.handler.**ContextHandler.handle(**
>> ContextHandler.java:766)
>>        at org.mortbay.jetty.webapp.**WebAppContext.handle(**
>> WebAppContext.java:450)
>>        at org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(*
>> *ContextHandlerCollection.java:**230)
>>        at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
>> HandlerWrapper.java:152)
>>        at org.mortbay.jetty.Server.**handle(Server.java:326)
>>        at org.mortbay.jetty.**HttpConnection.handleRequest(**
>> HttpConnection.java:542)
>>        at org.mortbay.jetty.**HttpConnection$RequestHandler.**
>> headerComplete(HttpConnection.**java:928)
>>        at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549)
>>        at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
>> java:212)
>>        at org.mortbay.jetty.**HttpConnection.handle(**
>> HttpConnection.java:404)
>>
>> On the secondary name node:
>>
>> 2011-09-06 16:51:53,538 ERROR org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode:
>> java.io.FileNotFoundException: http://ftrr-nam6000.**
>> chestermcgee.com:50070/**getimage?getimage=1<http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1>
>>        at sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native
>> Method)
>>        at sun.reflect.**NativeConstructorAccessorImpl.**newInstance(**
>> NativeConstructorAccessorImpl.**java:39)
>>        at sun.reflect.**DelegatingConstructorAccessorI**mpl.newInstance(*
>> *DelegatingConstructorAccessorI**mpl.java:27)
>>        at java.lang.reflect.Constructor.**newInstance(Constructor.java:**
>> 513)
>>        at sun.net.www.protocol.http.**HttpURLConnection$6.run(**
>> HttpURLConnection.java:1360)
>>        at java.security.**AccessController.doPrivileged(**Native Method)
>>        at sun.net.www.protocol.http.**HttpURLConnection.**
>> getChainedException(**HttpURLConnection.java:1354)
>>        at sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
>> **HttpURLConnection.java:1008)
>>        at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
>> getFileClient(TransferFsImage.**java:183)
>>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
>> run(SecondaryNameNode.java:**348)
>>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
>> run(SecondaryNameNode.java:**337)
>>        at java.security.**AccessController.doPrivileged(**Native Method)
>>        at javax.security.auth.Subject.**doAs(Subject.java:396)
>>        at org.apache.hadoop.security.**UserGroupInformation.doAs(**
>> UserGroupInformation.java:**1115)
>>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>> downloadCheckpointFiles(**SecondaryNameNode.java:337)
>>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>> doCheckpoint(**SecondaryNameNode.java:422)
>>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>> doWork(SecondaryNameNode.java:**313)
>>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>> run(SecondaryNameNode.java:**276)
>>        at java.lang.Thread.run(Thread.**java:619)
>> Caused by: java.io.FileNotFoundException: http://ftrr-nam6000.las1.**
>> fanops.net:50070/getimage?**getimage=1<http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1>
>>        at sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
>> **HttpURLConnection.java:1303)
>>        at sun.net.www.protocol.http.**HttpURLConnection.**getHeaderField(
>> **HttpURLConnection.java:2165)
>>        at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
>> getFileClient(TransferFsImage.**java:175)
>>        ... 10 more
>>
>> Any help would be very much appreciated.  I'm scared to shut down the NN.
>>  I've tried restarting the 2NN.
>>
>> Thank You
>> -jeremy
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message