hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hansen <jer...@skidrow.la>
Subject Re: IMAGE_AND_EDITS Failed
Date Wed, 07 Sep 2011 16:57:35 GMT
Things still work in hdfs but the edits file is not being updated. Timestamp is sept 2nd. 

-jeremy

On Sep 7, 2011, at 9:45 AM, Ravi Prakash <ravihadoop@gmail.com> wrote:

> If your HDFS is still working, the fsimage file won't be getting updated but
> the edits file still should. That's why I asked question 2.
> 
> On Wed, Sep 7, 2011 at 11:39 AM, Jeremy Hansen <jeremy@skidrow.la> wrote:
> 
>> The problem is that fsimage and edits are no longer being updated, so…if I
>> restart, how could it replay those?
>> 
>> -jeremy
>> 
>> 
>> On Sep 7, 2011, at 8:48 AM, Ravi Prakash wrote:
>> 
>>> Actually I take that back. Restarting the NN might not result in loss of
>>> data. It will probably just take longer to start up because it would read
>>> the fsimage, then apply the fsedits (rather than the SNN doing it).
>>> 
>>> On Wed, Sep 7, 2011 at 10:46 AM, Ravi Prakash <ravihadoop@gmail.com>
>> wrote:
>>> 
>>>> Hi Jeremy,
>>>> 
>>>> Couple of questions:
>>>> 
>>>> 1. Which version of Hadoop are you using?
>>>> 2. If you write something into HDFS, can you subsequently read it?
>>>> 3. Are you sure your secondarynamenode configuration is correct? It
>> seems
>>>> like your SNN is telling your NN to roll the edit log (move the
>> journaling
>>>> directory from current to .new), but when it tries to download the image
>>>> file, its not finding it.
>>>> 3. I wish I could say I haven't ever seen that stack trace in the logs.
>> I
>>>> was seeing something similar (not the same, quite far from it actually)
>> (
>>>> https://issues.apache.org/jira/browse/HDFS-2011 ).
>>>> 
>>>> If I were you, and I felt exceptionally brave (mind you I've worked with
>>>> only test systems, no production sys-admin guts for me ;-) ) I would
>>>> probably do everything I can, to get the secondarynamenode started
>> properly
>>>> and make it checkpoint properly.
>>>> 
>>>> Me thinks restarting the namenode will most likely result in loss of
>> data.
>>>> 
>>>> Hope this helps
>>>> Ravi.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen <jeremy@skidrow.la>
>> wrote:
>>>> 
>>>>> 
>>>>> I happened to notice this today and being fairly new to administering
>>>>> hadoop, I'm not exactly sure how to pull out of this situation without
>> data
>>>>> loss.
>>>>> 
>>>>> The checkpoint hasn't happened since Sept 2nd.
>>>>> 
>>>>> -rw-r--r-- 1 hdfs hdfs        8889 Sep  2 14:09 edits
>>>>> -rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
>>>>> -rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
>>>>> -rw-r--r-- 1 hdfs hdfs           8 Sep  2 14:09 fstime
>>>>> -rw-r--r-- 1 hdfs hdfs         100 Sep  2 14:09 VERSION
>>>>> 
>>>>> /mnt/data0/dfs/nn/image
>>>>> -rw-r--r-- 1 hdfs hdfs    157 Sep  2 14:09 fsimage
>>>>> 
>>>>> I'm also seeing this in the NN logs:
>>>>> 
>>>>> 2011-09-06 16:48:23,738 INFO
>> org.apache.hadoop.hdfs.server.**namenode.FSNamesystem:
>>>>> Roll Edit Log from 10.10.10.11
>>>>> 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage:
>>>>> java.io.IOException: GetImage failed. java.lang.NullPointerException
>>>>>      at
>> org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(*
>>>>> *FSImage.java:219)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.FSImage.**
>>>>> getFsImageName(FSImage.java:**1584)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
>>>>> run(GetImageServlet.java:75)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
>>>>> run(GetImageServlet.java:70)
>>>>>      at java.security.**AccessController.doPrivileged(**Native Method)
>>>>>      at javax.security.auth.Subject.**doAs(Subject.java:396)
>>>>>      at org.apache.hadoop.security.**UserGroupInformation.doAs(**
>>>>> UserGroupInformation.java:**1115)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.**
>>>>> doGet(GetImageServlet.java:70)
>>>>>      at javax.servlet.http.**HttpServlet.service(**
>>>>> HttpServlet.java:707)
>>>>>      at javax.servlet.http.**HttpServlet.service(**
>>>>> HttpServlet.java:820)
>>>>>      at org.mortbay.jetty.servlet.**ServletHolder.handle(**
>>>>> ServletHolder.java:511)
>>>>>      at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
>>>>> doFilter(ServletHandler.java:**1221)
>>>>>      at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.**
>>>>> doFilter(HttpServer.java:824)
>>>>>      at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
>>>>> doFilter(ServletHandler.java:**1212)
>>>>>      at org.mortbay.jetty.servlet.**ServletHandler.handle(**
>>>>> ServletHandler.java:399)
>>>>>      at org.mortbay.jetty.security.**SecurityHandler.handle(**
>>>>> SecurityHandler.java:216)
>>>>>      at org.mortbay.jetty.servlet.**SessionHandler.handle(**
>>>>> SessionHandler.java:182)
>>>>>      at org.mortbay.jetty.handler.**ContextHandler.handle(**
>>>>> ContextHandler.java:766)
>>>>>      at org.mortbay.jetty.webapp.**WebAppContext.handle(**
>>>>> WebAppContext.java:450)
>>>>>      at
>> org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(*
>>>>> *ContextHandlerCollection.java:**230)
>>>>>      at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
>>>>> HandlerWrapper.java:152)
>>>>>      at org.mortbay.jetty.Server.**handle(Server.java:326)
>>>>>      at org.mortbay.jetty.**HttpConnection.handleRequest(**
>>>>> HttpConnection.java:542)
>>>>>      at org.mortbay.jetty.**HttpConnection$RequestHandler.**
>>>>> headerComplete(HttpConnection.**java:928)
>>>>>      at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549)
>>>>>      at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
>>>>> java:212)
>>>>>      at org.mortbay.jetty.**HttpConnection.handle(**
>>>>> HttpConnection.java:404)
>>>>> 
>>>>> On the secondary name node:
>>>>> 
>>>>> 2011-09-06 16:51:53,538 ERROR
>> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode:
>>>>> java.io.FileNotFoundException: http://ftrr-nam6000.**
>>>>> chestermcgee.com:50070/**getimage?getimage=1<
>> http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1>
>>>>>      at
>> sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native
>>>>> Method)
>>>>>      at sun.reflect.**NativeConstructorAccessorImpl.**newInstance(**
>>>>> NativeConstructorAccessorImpl.**java:39)
>>>>>      at
>> sun.reflect.**DelegatingConstructorAccessorI**mpl.newInstance(*
>>>>> *DelegatingConstructorAccessorI**mpl.java:27)
>>>>>      at
>> java.lang.reflect.Constructor.**newInstance(Constructor.java:**
>>>>> 513)
>>>>>      at sun.net.www.protocol.http.**HttpURLConnection$6.run(**
>>>>> HttpURLConnection.java:1360)
>>>>>      at java.security.**AccessController.doPrivileged(**Native Method)
>>>>>      at sun.net.www.protocol.http.**HttpURLConnection.**
>>>>> getChainedException(**HttpURLConnection.java:1354)
>>>>>      at
>> sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
>>>>> **HttpURLConnection.java:1008)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
>>>>> getFileClient(TransferFsImage.**java:183)
>>>>>      at
>> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
>>>>> run(SecondaryNameNode.java:**348)
>>>>>      at
>> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
>>>>> run(SecondaryNameNode.java:**337)
>>>>>      at java.security.**AccessController.doPrivileged(**Native Method)
>>>>>      at javax.security.auth.Subject.**doAs(Subject.java:396)
>>>>>      at org.apache.hadoop.security.**UserGroupInformation.doAs(**
>>>>> UserGroupInformation.java:**1115)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>>>>> downloadCheckpointFiles(**SecondaryNameNode.java:337)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>>>>> doCheckpoint(**SecondaryNameNode.java:422)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>>>>> doWork(SecondaryNameNode.java:**313)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>>>>> run(SecondaryNameNode.java:**276)
>>>>>      at java.lang.Thread.run(Thread.**java:619)
>>>>> Caused by: java.io.FileNotFoundException: http://ftrr-nam6000.las1.**
>>>>> fanops.net:50070/getimage?**getimage=1<
>> http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1>
>>>>>      at
>> sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
>>>>> **HttpURLConnection.java:1303)
>>>>>      at
>> sun.net.www.protocol.http.**HttpURLConnection.**getHeaderField(
>>>>> **HttpURLConnection.java:2165)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
>>>>> getFileClient(TransferFsImage.**java:175)
>>>>>      ... 10 more
>>>>> 
>>>>> Any help would be very much appreciated.  I'm scared to shut down the
>> NN.
>>>>> I've tried restarting the 2NN.
>>>>> 
>>>>> Thank You
>>>>> -jeremy
>>>>> 
>>>> 
>>>> 
>> 
>> 

Mime
View raw message