hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/Troubleshooting" by DrakeMcSmooth
Date Thu, 05 Jun 2008 20:42:46 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by DrakeMcSmooth:
http://wiki.apache.org/hadoop/Hbase/Troubleshooting

------------------------------------------------------------------------------
+ == Problem: Master initializes, but Region Servers do not ==
- == Problem: Master node initializes, but the datanodes of slave nodes do not ==
-  * Master node activates ''DataNode'' and ''TaskTracker'' on itself and the slave nodes,
but ''dfshealth'' only shows 1 Live Node, the Master node.
-  * Slave node's tasktracker log contains repeated instances of the following block:
+  * Master's log contains repeated instances of the following block:
-   ~-2007-11-27 11:09:39,293 INFO org.apache.hadoop.ipc.RPC: Server at masternode/192.168.222.23:54311
not available yet, Zzzzz...[[BR]]
-   2007-11-27 11:09:40,299 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
masternode/192.168.100.50:54311. Already tried 1 time(s).[[BR]]
+   ~-INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /127.0.0.1:60020. Already
tried 1 time(s).[[BR]]
-   2007-11-27 11:09:41,303 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
masternode/192.168.100.50:54311. Already tried 2 time(s).[[BR]]
+   INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /127.0.0.1:60020. Already
tried 2 time(s).[[BR]]
+   ...
+   INFO org.apache.hadoop.ipc.RPC: Server at /127.0.0.1:60020 not available yet, Zzzzz...-~
+  * Region Servers' logs contains repeated instances of the following block:
-   2007-11-27 11:09:42,309 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
masternode/192.168.100.50:54311. Already tried 3 time(s).[[BR]]
-   2007-11-27 11:09:43,314 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
masternode/192.168.100.50:54311. Already tried 4 time(s).[[BR]]
-   2007-11-27 11:09:44,319 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
masternode/192.168.100.50:54311. Already tried 5 time(s).[[BR]]
-   2007-11-27 11:09:45,324 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
masternode/192.168.100.50:54311. Already tried 6 time(s).[[BR]]
-   2007-11-27 11:09:46,329 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
masternode/192.168.100.50:54311. Already tried 7 time(s).[[BR]]
-   2007-11-27 11:09:47,332 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
masternode/192.168.100.50:54311. Already tried 8 time(s).[[BR]]
-   2007-11-27 11:09:48,336 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
masternode/192.168.100.50:54311. Already tried 9 time(s).[[BR]]
+   ~-INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.100.50:60000.
Already tried 9 time(s).
-   2007-11-27 11:09:49,342 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
masternode/192.168.100.50:54311. Already tried 10 time(s).[[BR]]
+   INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.100.50:60000.
Already tried 10 time(s).
-   2007-11-27 11:09:50,347 INFO org.apache.hadoop.ipc.RPC: Server at masternode/192.168.100.50:54311
not available yet, Zzzzz...-~
+   INFO org.apache.hadoop.ipc.RPC: Server at masternode/192.168.100.50:60000 not available
yet, Zzzzz...-~
- 
+  * Note that the Master believes the Region Servers have the IP of 127.0.0.1 - which is
localhost and resolves to the master's own localhost.
  === Causes ===
-  * That port on the master node is not accessible from other nodes on the network
+  * The Region Servers are erroneously informing the Master that their IP addresses are 127.0.0.1.
  === Resolution ===
-  * Modify <code>/etc/hosts</code> on the master node, from
+  * Modify <code>/etc/hosts</code> on the region servers, from
    {{{
  # Do not remove the following line, or various programs
  # that require network functionality will fail.
- 127.0.0.1		masternode localhost.localdomain localhost
+ 127.0.0.1		fully.qualified.regionservername regionservername  localhost.localdomain localhost
  ::1		localhost6.localdomain6 localhost6
  }}}
  
@@ -34, +29 @@

  127.0.0.1		localhost.localdomain localhost
  ::1		localhost6.localdomain6 localhost6
  }}}
- 
-  * As a result '''netstat''' should return the following
-   ~-$ netstat -an | grep LISTEN
-   tcp  0      0 0.0.0.0:756                 0.0.0.0:*    LISTEN[[BR]]
-   tcp  0      0 127.0.0.1:631               0.0.0.0:*    LISTEN[[BR]]
-   '''tcp  0      0 ::ffff:192.168.100.50:54310 :::*         LISTEN'''[[BR]]
-   tcp  0      0 :::50090                    :::*         LISTEN[[BR]]
-   tcp  0      0 :::50070                    :::*         LISTEN-~
- 
  
  == Problem: HRegionServers have lease issues on starting Hbase ==
   * HRegionServers connect initially, then drop off due to '''LeaseExpiredException'''

Mime
View raw message