hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <nutch-...@dragonflymc.com>
Subject Re: Namenode cannot accept connection from datanode
Date Mon, 14 May 2007 17:31:27 GMT
We have run into this problem before.  If you have a static address for 
the machine, make sure that your hosts file is pointing to the static 
address for the namenode host name as opposed to the 127.0.0.1 address. 
  It should look something like this with the values replaced with your 
values.

127.0.0.1               localhost.localdomain localhost
192.x.x.x               yourhost.yourdomain.com yourhost

Dennis Kubes

Cedric Ho wrote:
> Oh, and I also tried to use 192.168.1.179 as a datanode itself, and
> only this datanode connect to the namenode on this same host
> successfully.
> 
> On 5/14/07, Cedric Ho <cedric.ho@gmail.com> wrote:
>> I performed more testing on this. While the namenode is running, I
>> cannot connect to 192.168.1.179:9000 from other machines, but I can
>> connect to it locally. It seems that the serverSocket only bind to the
>> 127.0.0.1:9000 but not 192.168.1.179:9000.
>>
>> I've also confirmed that there's no firewall, connection bocking etc
>> on this machine. In fact I've written a small Java program that open a
>> serverSocket on 9000, started with the same user on the same machine.
>> And I am able to connect to it from all other machines.
>>
>> So is there some settings that will cause the namenode to only bind to
>> the 9000 port on the local interface ?
>>
>> Cedric
>>
>>
>> On 5/12/07, Michael Bieniosek <michael@powerset.com> wrote:
>> > I would try to debug this as a network problem - when the namenode is
>> > running, can you connect to 192.168.1.179:9000 from the machine the 
>> datanode
>> > is on?
>> >
>> > While the namenode does use a lot of RAM as the cluster size 
>> increases, an
>> > overloaded namenode will typically start panicking in its log messages.
>> > This doesn't occur in your namenode logs - it doesn't appear any 
>> datanodes
>> > connected at all.
>> >
>> > -Michael
>> >
>> > On 5/10/07 7:39 PM, "Cedric Ho" <cedric.ho@gmail.com> wrote:
>> >
>> > > Hi all,
>> > >
>> > > We were trying to setup hadoop in our linux environments. When we
>> > > tried to use a slow machine as the namenode (some Pentium III machine
>> > > with 512Mb ram). It seems that it was unable to accept connection 
>> from
>> > > other datanodes. (I can access its status from http at port 50070
>> > > however).
>> > >
>> > > But it works fine on a faster machine (Pentium4 3Ghz with 3Gb ram).
>> > > The settings, etc are exactly the same.
>> > >
>> > > The problem seems to be on the namenode. Is it because the machine 
>> is slow ?
>> > >
>> > > The version we use is 0.12.3
>> > >
>> > > Any help is appreciated.
>> > >
>> > >
>> > > Following is the log from the abnormal namenode.
>> > >
>> > > 2007-05-09 18:18:46,998 INFO org.apache.hadoop.dfs.StateChange: 
>> STATE*
>> > > Network topology has 0 racks and 0 datanodes
>> > > 2007-05-09 18:18:47,000 INFO org.apache.hadoop.dfs.StateChange: 
>> STATE*
>> > > UnderReplicatedBlocks has 0 blocks
>> > > 2007-05-09 18:18:47,432 INFO org.mortbay.util.Credential: Checking
>> > > Resource aliases
>> > > 2007-05-09 18:18:48,051 INFO org.mortbay.http.HttpServer: Version 
>> Jetty/5.1.4
>> > > 2007-05-09 18:18:50,524 INFO org.mortbay.util.Container: Started
>> > > org.mortbay.jetty.servlet.WebApplicationHandler@587c94
>> > > 2007-05-09 18:18:51,064 INFO org.mortbay.util.Container: Started
>> > > WebApplicationContext[/,/]
>> > > 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
>> > > HttpContext[/logs,/logs]
>> > > 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
>> > > HttpContext[/static,/static]
>> > > 2007-05-09 18:18:51,147 INFO org.mortbay.http.SocketListener: Started
>> > > SocketListener on 0.0.0.0:50070
>> > > 2007-05-09 18:18:51,148 INFO org.mortbay.util.Container: Started
>> > > org.mortbay.jetty.Server@e53108
>> > > 2007-05-09 18:18:51,223 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > listener on 9000: starting
>> > > 2007-05-09 18:18:51,226 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 0 on 9000: starting
>> > > 2007-05-09 18:18:51,227 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 1 on 9000: starting
>> > > 2007-05-09 18:18:51,228 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 2 on 9000: starting
>> > > 2007-05-09 18:18:51,229 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 3 on 9000: starting
>> > > 2007-05-09 18:18:51,391 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 4 on 9000: starting
>> > > 2007-05-09 18:18:51,392 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 5 on 9000: starting
>> > > 2007-05-09 18:18:51,393 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 6 on 9000: starting
>> > > 2007-05-09 18:18:51,394 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 7 on 9000: starting
>> > > 2007-05-09 18:18:51,395 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 8 on 9000: starting
>> > > 2007-05-09 18:18:51,397 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 9 on 9000: starting
>> > >
>> > >
>> > > And these are from the datanode
>> > >
>> > > 2007-05-09 18:35:13,263 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. 
>> Already
>> > > tried 1 time(s).
>> > > 2007-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. 
>> Already
>> > > tried 2 time(s).
>> > > 2007-05-09 18:35:15,270 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. 
>> Already
>> > > tried 3 time(s).
>> > > 2007-05-09 18:35:16,274 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. 
>> Already
>> > > tried 4 time(s).
>> > > 2007-05-09 18:35:17,279 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. 
>> Already
>> > > tried 5 time(s).
>> > > 2007-05-09 18:35:18,283 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. 
>> Already
>> > > tried 6 time(s).
>> > > 2007-05-09 18:35:19,288 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. 
>> Already
>> > > tried 7 time(s).
>> > > 2007-05-09 18:35:20,293 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. 
>> Already
>> > > tried 8 time(s).
>> > > 2007-05-09 18:35:21,295 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. 
>> Already
>> > > tried 9 time(s).
>> > > 2007-05-09 18:35:22,298 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. 
>> Already
>> > > tried 10 time(s).
>> > > 2007-05-09 18:35:23,304 INFO org.apache.hadoop.ipc.RPC: Server at
>> > > hadoop01.ourcompany.com/192.168.1.179:9000 not available yet, 
>> Zzzzz...
>> > > 2007-05-09 18:35:24,308 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. 
>> Already
>> > > tried 1 time(s).
>> > > 2007-05-09 18:35:25,317 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. 
>> Already
>> > > tried 2 time(s).
>> > > 2007-05-09 18:35:26,322 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. 
>> Already
>> > > tried 3 time(s).
>> > >
>> > >
>> > > Thanks,
>> > > Cedric
>> >
>> >
>>
> 
> 

Mime
View raw message