hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Rosendorf <grosend...@e3smartenergy.com>
Subject Re: Checkpoint Woes
Date Thu, 02 Feb 2012 14:11:03 GMT
I thought that I had, but perhaps not.  I assume that I would configure this on the master,
using the dfs.namenode.backup.address parameter.  Is that correct?
Thanks,
Gabe

On Feb 1, 2012, at 4:27 PM, Jakob Homan wrote:

>> Posted URL master:50070putimage=1&port=50090&machine=0.0.0.0&token=-31:1318804155:0:1328129935000:1328129628242
> Have you defined your secondary namenode address?  The 2NN is telling
> the NN to pull the merged image from http://0.0.0.0.0:50090.
> 
> 
> On Wed, Feb 1, 2012 at 1:23 PM, Gabriel Rosendorf
> <grosendorf@e3smartenergy.com> wrote:
>> No firewall.
>> 
>> Here's hdfs-site.xml from 2NN: http://pastie.org/3298304
>> And from NN: http://pastie.org/3298309
>> 
>> On Feb 1, 2012, at 4:18 PM, Harsh J wrote:
>> 
>>> Have you ensured there is no firewall between the two hosts? Can you
>>> also pastebin your hdfs-site.xml?
>>> 
>>> On Thu, Feb 2, 2012 at 2:45 AM, Gabriel Rosendorf
>>> <grosendorf@e3smartenergy.com> wrote:
>>>> So I'm at a loss.  Checkpointing is failing, and both hosts (NN and 2NN)
are
>>>> reachable via HTTP.
>>>> Any ideas would be greatly appreciated!
>>>> 
>>>> From NameNode:
>>>> 
>>>> 2012-02-01 21:02:53,460 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK*
>>>> NameSystem.processReport: from 10.178.231.219:50010, blocks: 0, processing
>>>> time: 0 msecs
>>>> 2012-02-01 21:03:56,581 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
>>>> 10.178.224.109
>>>> 2012-02-01 21:03:56,685 WARN org.mortbay.log: /getimage:
>>>> java.io.IOException: GetImage failed. java.net.ConnectException: Connection
>>>> refused
>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
>>>> at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:211)
>>>> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>>>> at java.net.Socket.connect(Socket.java:529)
>>>> at java.net.Socket.connect(Socket.java:478)
>>>> at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
>>>> at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
>>>> at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
>>>> at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
>>>> at sun.net.www.http.HttpClient.New(HttpClient.java:306)
>>>> at sun.net.www.http.HttpClient.New(HttpClient.java:323)
>>>> at
>>>> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
>>>> at
>>>> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
>>>> at
>>>> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
>>>> at
>>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:160)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:88)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:85)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:85)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:70)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:70)
>>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>>>> at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>>>> at
>>>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>>>> at
>>>> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:816)
>>>> at
>>>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>>>> at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>>>> at
>>>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>>>> at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>>>> at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>>>> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>>>> at
>>>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>>>> at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>>>> at org.mortbay.jetty.Server.handle(Server.java:326)
>>>> at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>>>> at
>>>> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>>>> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>>>> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>>>> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>>>> at
>>>> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
>>>> at
>>>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>>>> 
>>>> 2012-02-01 21:08:56,691 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
>>>> 10.178.224.109
>>>> 2012-02-01 21:08:56,804 WARN org.mortbay.log: /getimage:
>>>> java.io.IOException: GetImage failed. java.net.ConnectException: Connection
>>>> refused
>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
>>>> at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:211)
>>>> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>>>> at java.net.Socket.connect(Socket.java:529)
>>>> at java.net.Socket.connect(Socket.java:478)
>>>> at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
>>>> at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
>>>> at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
>>>> at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
>>>> at sun.net.www.http.HttpClient.New(HttpClient.java:306)
>>>> at sun.net.www.http.HttpClient.New(HttpClient.java:323)
>>>> at
>>>> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
>>>> at
>>>> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
>>>> at
>>>> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
>>>> at
>>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:160)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:88)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:85)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:85)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:70)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:70)
>>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>>>> at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>>>> at
>>>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>>>> at
>>>> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:816)
>>>> at
>>>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>>>> at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>>>> at
>>>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>>>> at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>>>> at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>>>> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>>>> at
>>>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>>>> at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>>>> at org.mortbay.jetty.Server.handle(Server.java:326)
>>>> at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>>>> at
>>>> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>>>> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>>>> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>>>> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>>>> at
>>>> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
>>>> at
>>>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>>>> 
>>>> 
>>>> From 2NN:
>>>> 
>>>> 2012-02-01 21:03:58,634 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
>>>> 0 Total time for transactions(ms): 0Number of transactions batched in Syncs:
>>>> 0 Number of syncs: 0 SyncTimes(ms): 0
>>>> 2012-02-01 21:03:58,652 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
>>>> fsimage size 112 bytes.
>>>> 2012-02-01 21:03:58,654 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
>>>> edits size 4 bytes.
>>>> 2012-02-01 21:03:58,655 INFO org.apache.hadoop.hdfs.util.GSet: VM type
>>>> = 64-bit
>>>> 2012-02-01 21:03:58,655 INFO org.apache.hadoop.hdfs.util.GSet: 2% max memory
>>>> = 17.77875 MB
>>>> 2012-02-01 21:03:58,655 INFO org.apache.hadoop.hdfs.util.GSet: capacity
>>>>  = 2^21 = 2097152 entries
>>>> 2012-02-01 21:03:58,655 INFO org.apache.hadoop.hdfs.util.GSet:
>>>> recommended=2097152, actual=2097152
>>>> 2012-02-01 21:03:58,658 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hduser
>>>> 2012-02-01 21:03:58,658 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>>>> 2012-02-01 21:03:58,658 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>> isPermissionEnabled=true
>>>> 2012-02-01 21:03:58,658 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>> dfs.block.invalidate.limit=100
>>>> 2012-02-01 21:03:58,658 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
>>>> accessTokenLifetime=0 min(s)
>>>> 2012-02-01 21:03:58,658 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring
>>>> more than 10 times
>>>> 2012-02-01 21:03:58,700 INFO org.apache.hadoop.hdfs.server.common.Storage:
>>>> Number of files = 1
>>>> 2012-02-01 21:03:58,700 INFO org.apache.hadoop.hdfs.server.common.Storage:
>>>> Number of files under construction = 0
>>>> 2012-02-01 21:03:58,718 INFO org.apache.hadoop.hdfs.server.common.Storage:
>>>> Edits file /app/hadoop/tmp/dfs/namesecondary/current/edits of size 4 edits
#
>>>> 0 loaded in 0 seconds.
>>>> 2012-02-01 21:03:58,719 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
>>>> 0 Total time for transactions(ms): 0Number of transactions batched in Syncs:
>>>> 0 Number of syncs: 0 SyncTimes(ms): 0
>>>> 2012-02-01 21:03:58,723 INFO org.apache.hadoop.hdfs.server.common.Storage:
>>>> Image file of size 112 saved in 0 seconds.
>>>> 2012-02-01 21:03:58,733 INFO org.apache.hadoop.hdfs.server.common.Storage:
>>>> Image file of size 112 saved in 0 seconds.
>>>> 2012-02-01 21:03:58,739 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
>>>> master:50070putimage=1&port=50090&machine=0.0.0.0&token=-31:1318804155:0:1328129935000:1328129628242
>>>> 2012-02-01 21:03:58,749 ERROR
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
>>>> doCheckpoint:
>>>> 2012-02-01 21:03:58,749 ERROR
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
>>>> java.io.FileNotFoundException:
>>>> http://master:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-31:1318804155:0:1328129935000:1328129628242
>>>> at
>>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:160)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.putFSImage(SecondaryNameNode.java:377)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:418)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:312)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:275)
>>>> at java.lang.Thread.run(Thread.java:662)
>>>> 
>>>> 2012-02-01 21:08:58,754 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
>>>> 0 Total time for transactions(ms): 0Number of transactions batched in Syncs:
>>>> 0 Number of syncs: 0 SyncTimes(ms): 0
>>>> 2012-02-01 21:08:58,765 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
>>>> fsimage size 112 bytes.
>>>> 2012-02-01 21:08:58,772 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
>>>> edits size 4 bytes.
>>>> 2012-02-01 21:08:58,772 INFO org.apache.hadoop.hdfs.util.GSet: VM type
>>>> = 64-bit
>>>> 2012-02-01 21:08:58,772 INFO org.apache.hadoop.hdfs.util.GSet: 2% max memory
>>>> = 17.77875 MB
>>>> 2012-02-01 21:08:58,772 INFO org.apache.hadoop.hdfs.util.GSet: capacity
>>>>  = 2^21 = 2097152 entries
>>>> 2012-02-01 21:08:58,772 INFO org.apache.hadoop.hdfs.util.GSet:
>>>> recommended=2097152, actual=2097152
>>>> 2012-02-01 21:08:58,775 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hduser
>>>> 2012-02-01 21:08:58,775 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>>>> 2012-02-01 21:08:58,775 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>> isPermissionEnabled=true
>>>> 2012-02-01 21:08:58,776 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>> dfs.block.invalidate.limit=100
>>>> 2012-02-01 21:08:58,776 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
>>>> accessTokenLifetime=0 min(s)
>>>> 2012-02-01 21:08:58,776 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring
>>>> more than 10 times
>>>> 2012-02-01 21:08:58,776 INFO org.apache.hadoop.hdfs.server.common.Storage:
>>>> Number of files = 1
>>>> 2012-02-01 21:08:58,777 INFO org.apache.hadoop.hdfs.server.common.Storage:
>>>> Number of files under construction = 0
>>>> 2012-02-01 21:08:58,777 INFO org.apache.hadoop.hdfs.server.common.Storage:
>>>> Edits file /app/hadoop/tmp/dfs/namesecondary/current/edits of size 4 edits
#
>>>> 0 loaded in 0 seconds.
>>>> 2012-02-01 21:08:58,777 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
>>>> 0 Total time for transactions(ms): 0Number of transactions batched in Syncs:
>>>> 0 Number of syncs: 0 SyncTimes(ms): 0
>>>> 2012-02-01 21:08:58,781 INFO org.apache.hadoop.hdfs.server.common.Storage:
>>>> Image file of size 112 saved in 0 seconds.
>>>> 2012-02-01 21:08:58,789 INFO org.apache.hadoop.hdfs.server.common.Storage:
>>>> Image file of size 112 saved in 0 seconds.
>>>> 2012-02-01 21:08:58,838 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
>>>> master:50070putimage=1&port=50090&machine=0.0.0.0&token=-31:1318804155:0:1328129935000:1328129628242
>>>> 2012-02-01 21:08:58,872 ERROR
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
>>>> doCheckpoint:
>>>> 2012-02-01 21:08:58,872 ERROR
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
>>>> java.io.FileNotFoundException:
>>>> http://master:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-31:1318804155:0:1328129935000:1328129628242
>>>> at
>>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:160)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.putFSImage(SecondaryNameNode.java:377)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:418)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:312)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:275)
>>>> at java.lang.Thread.run(Thread.java:662)
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
>>> Customer Ops. Engineer
>>> Cloudera | http://tiny.cloudera.com/about
>> 


Mime
View raw message