hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "GOEKE, MATTHEW (AG/1000)" <matthew.go...@monsanto.com>
Subject RE: Why I cannot see live nodes in a LAN-based cluster setup?
Date Tue, 28 Jun 2011 03:56:34 GMT
At this point if that is the correct ip then I would see if you can actually ssh from the DN
to the NN to make sure it can actually connect to the other box. If you can successfully connect
through ssh then it's just a matter of figuring out why that port is having issues (netstat
is your friend in this case). If you see it listening on 54310 then just power cycle the box
and try again.

Matt

-----Original Message-----
From: Jingwei Lu [mailto:jlu@ucsd.edu] 
Sent: Monday, June 27, 2011 5:38 PM
To: common-user@hadoop.apache.org
Subject: Re: Why I cannot see live nodes in a LAN-based cluster setup?

Hi Matt and Jeff:

Thanks a lot for your instructions. I corrected the mistakes in conf files
of DN, and now the log on DN becomes:

2011-06-27 15:32:36,025 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 0 time(s).
2011-06-27 15:32:37,028 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 1 time(s).
2011-06-27 15:32:38,031 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 2 time(s).
2011-06-27 15:32:39,034 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 3 time(s).
2011-06-27 15:32:40,037 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 4 time(s).
2011-06-27 15:32:41,040 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 5 time(s).
2011-06-27 15:32:42,043 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 6 time(s).
2011-06-27 15:32:43,046 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 7 time(s).
2011-06-27 15:32:44,049 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 8 time(s).
2011-06-27 15:32:45,052 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 9 time(s).
2011-06-27 15:32:45,053 INFO org.apache.hadoop.ipc.RPC: Server at
clock.ucsd.edu/132.239.95.91:54310 not available yet, Zzzzz...

Seems DN is trying to bind with NN but always fails...



Best Regards
Yours Sincerely

Jingwei Lu



On Mon, Jun 27, 2011 at 2:22 PM, GOEKE, MATTHEW (AG/1000) <
matthew.goeke@monsanto.com> wrote:

> As a follow-up to what Jeff posted: go ahead and ignore the message you got
> on the NN for now.
>
> If you look at the address that the DN log shows it is 127.0.0.1 and the
> ip:port it is trying to connect to for the NN is 127.0.0.1:54310 ---> it
> is trying to bind to itself as if it was still in single machine mode. Make
> sure that you have correctly pushed the URI for the NN into the config files
> on both machines and then bounce DFS.
>
> Matt
>
> -----Original Message-----
> From: Jeff.Schmitz@shell.com [mailto:Jeff.Schmitz@shell.com]
> Sent: Monday, June 27, 2011 4:08 PM
> To: common-user@hadoop.apache.org
> Subject: RE: Why I cannot see live nodes in a LAN-based cluster setup?
>
> http://www.mentby.com/tim-robertson/error-register-getprotocolversion.html
>
>
>
> -----Original Message-----
> From: Jingwei Lu [mailto:jlu@ucsd.edu]
> Sent: Monday, June 27, 2011 3:58 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Why I cannot see live nodes in a LAN-based cluster setup?
>
> Hi,
>
> I just manually modify the masters & slaves files in the both machines.
>
> I found something wrong in the log files, as shown below:
>
> -- Master :
> namenote.log:
>
> ****************************************
> 2011-06-27 13:44:47,055 INFO org.mortbay.log: jetty-6.1.14
> 2011-06-27 13:44:47,394 INFO org.mortbay.log: Started
> SelectChannelConnector@0.0.0.0:50070
> 2011-06-27 13:44:47,395 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at:
> 0.0.0.0:50070
> 2011-06-27 13:44:47,395 INFO org.apache.hadoop.ipc.Server: IPC Server
> Responder: starting
> 2011-06-27 13:44:47,395 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 54310: starting
> 2011-06-27 13:44:47,396 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 54310: starting
> 2011-06-27 13:44:47,397 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 1 on 54310: starting
> 2011-06-27 13:44:47,397 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 54310: starting
> 2011-06-27 13:44:47,397 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 3 on 54310: starting
> 2011-06-27 13:44:47,402 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 4 on 54310: starting
> 2011-06-27 13:44:47,404 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 5 on 54310: starting
> 2011-06-27 13:44:47,406 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 6 on 54310: starting
> 2011-06-27 13:44:47,406 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 7 on 54310: starting
> 2011-06-27 13:44:47,406 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 54310: starting
> 2011-06-27 13:44:47,408 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 9 on 54310: starting
> 2011-06-27 13:44:47,500 INFO org.apache.hadoop.ipc.Server: Error register
> getProtocolVersion
> java.lang.IllegalArgumentException: Duplicate
> metricsName:getProtocolVersion
> at
> org.apache.hadoop.metrics.util.MetricsRegistry.add(MetricsRegistry.java:53)
> at
>
> org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.<init>(MetricsTimeVaryingRate.java:89)
> at
>
> org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.<init>(MetricsTimeVaryingRate.java:99)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 2011-06-27 13:45:02,572 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.registerDatanode: node registration from 127.0.0.1:50010storage
> DS-87816363-127.0.0.1-50010-1309207502566
> ****************************************
>
>
> -- slave:
> datanode.log:
>
> ****************************************
>      1 2011-06-27 13:45:00,335 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
>      2 /************************************************************
>      3 STARTUP_MSG: Starting DataNode
>      4 STARTUP_MSG:   host = hdl.ucsd.edu/127.0.0.1
>      5 STARTUP_MSG:   args = []
>      6 STARTUP_MSG:   version = 0.20.2
>      7 STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
>      8 ************************************************************/
>      9 2011-06-27 13:45:02,476 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 0 time(s).
>     10 2011-06-27 13:45:03,549 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 1 time(s).
>     11 2011-06-27 13:45:04,552 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 2 time(s).
>     12 2011-06-27 13:45:05,609 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 3 time(s).
>     13 2011-06-27 13:45:06,640 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 4 time(s).
>     14 2011-06-27 13:45:07,643 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 5 time(s).
>     15 2011-06-27 13:45:08,646 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 6 time(s).
>     16 2011-06-27 13:45:09,661 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 7 time(s).
>     17 2011-06-27 13:45:10,664 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 8 time(s).
>     18 2011-06-27 13:45:11,678 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 9 time(s).
>     19 2011-06-27 13:45:11,679 INFO org.apache.hadoop.ipc.RPC: Server at
> hdl.ucsd.edu/127.0.0.1:54310 not available yet, Zzzzz...
> ****************************************
>
> (just guess, is this due to some porting problem?)
>
> Any comments will be greatly appreciated!
>
> Best Regards
> Yours Sincerely
>
> Jingwei Lu
>
>
>
> On Mon, Jun 27, 2011 at 1:28 PM, GOEKE, MATTHEW (AG/1000) <
> matthew.goeke@monsanto.com> wrote:
>
> > Did you make sure to define the datanode/tasktracker in the slaves file
> in
> > your conf directory and push that to both machines? Also have you checked
> > the logs on either to see if there are any errors?
> >
> > Matt
> >
> > -----Original Message-----
> > From: Jingwei Lu [mailto:jlu@ucsd.edu]
> > Sent: Monday, June 27, 2011 3:24 PM
> > To: HADOOP MLIST
> > Subject: Why I cannot see live nodes in a LAN-based cluster setup?
> >
> > Hi Everyone:
> >
> > I am quite new to hadoop here. I am attempting to set up Hadoop locally
> in
> > two machines, connected by LAN. Both of them pass the single-node test.
> > However, I failed in two-node cluster setup, as shown in the 2 cases
> below:
> >
> > 1) set one as dedicated namenode and the other as dedicated datanode
> > 2) set one as both name- and data-node, and the other as just datanode
> >
> > I launch *start-dfs.sh *on the namenode. Since I have all the *ssh
> *issues
> > cleared, thus I can always observe the startup of daemon in every
> datanode.
> > However, by website of *http://(URI of namenode):50070 *it shows only 0
> > live
> > node for (1) and 1 live node for (2), which is the same as the output by
> > command-line *hadoop dfsadmin -report*
> >
> > Generally it appears that from the namenode you cannot observe the remote
> > datanode alive, let alone a normal across-node MapReduce execution.
> >
> > Could anyone give some hints / instructions at this point? I really
> > appreciate it!
> >
> > Thank.
> >
> > Best Regards
> > Yours Sincerely
> >
> > Jingwei Lu
> > This e-mail message may contain privileged and/or confidential
> information,
> > and is intended to be received only by persons entitled
> > to receive such information. If you have received this e-mail in error,
> > please notify the sender immediately. Please delete it and
> > all attachments from any servers, hard drives or any other media. Other
> use
> > of this e-mail by you is strictly prohibited.
> >
> > All e-mails and attachments sent and received are subject to monitoring,
> > reading and archival by Monsanto, including its
> > subsidiaries. The recipient of this e-mail is solely responsible for
> > checking for the presence of "Viruses" or other "Malware".
> > Monsanto, along with its subsidiaries, accepts no liability for any
> damage
> > caused by any such code transmitted by or accompanying
> > this e-mail or any attachment.
> >
> >
> > The information contained in this email may be subject to the export
> > control laws and regulations of the United States, potentially
> > including but not limited to the Export Administration Regulations (EAR)
> > and sanctions regulations issued by the U.S. Department of
> > Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of
> this
> > information you are obligated to comply with all
> > applicable U.S. export laws and regulations.
> >
>
Mime
View raw message