hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/Troubleshooting" by DrakeMcSmooth
Date Thu, 05 Jun 2008 20:31:07 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by DrakeMcSmooth:
http://wiki.apache.org/hadoop/Hbase/Troubleshooting

New page:
== Problem: Master node initializes, but the datanodes of slave nodes do not ==
 * Master node activates ''DataNode'' and ''TaskTracker'' on itself and the slave nodes, but
''dfshealth'' only shows 1 Live Node, the Master node.
 * Slave node's tasktracker log contains repeated instances of the following block:
  ~-2007-11-27 11:09:39,293 INFO org.apache.hadoop.ipc.RPC: Server at masternode/192.168.222.23:54311
not available yet, Zzzzz...[[BR]]
  2007-11-27 11:09:40,299 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.100.50:54311.
Already tried 1 time(s).[[BR]]
  2007-11-27 11:09:41,303 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.100.50:54311.
Already tried 2 time(s).[[BR]]
  2007-11-27 11:09:42,309 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.100.50:54311.
Already tried 3 time(s).[[BR]]
  2007-11-27 11:09:43,314 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.100.50:54311.
Already tried 4 time(s).[[BR]]
  2007-11-27 11:09:44,319 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.100.50:54311.
Already tried 5 time(s).[[BR]]
  2007-11-27 11:09:45,324 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.100.50:54311.
Already tried 6 time(s).[[BR]]
  2007-11-27 11:09:46,329 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.100.50:54311.
Already tried 7 time(s).[[BR]]
  2007-11-27 11:09:47,332 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.100.50:54311.
Already tried 8 time(s).[[BR]]
  2007-11-27 11:09:48,336 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.100.50:54311.
Already tried 9 time(s).[[BR]]
  2007-11-27 11:09:49,342 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: masternode/192.168.100.50:54311.
Already tried 10 time(s).[[BR]]
  2007-11-27 11:09:50,347 INFO org.apache.hadoop.ipc.RPC: Server at masternode/192.168.100.50:54311
not available yet, Zzzzz...-~

=== Causes ===
 * That port on the master node is not accessible from other nodes on the network
=== Resolution ===
 * Modify <code>/etc/hosts</code> on the master node, from
  {{{
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1		masternode localhost.localdomain localhost
::1		localhost6.localdomain6 localhost6
}}}

 * To (removing the master node's name from localhost)
{{{
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1		localhost.localdomain localhost
::1		localhost6.localdomain6 localhost6
}}}

 * As a result '''netstat''' should return the following
  ~-$ netstat -an | grep LISTEN
  tcp  0      0 0.0.0.0:756                 0.0.0.0:*    LISTEN[[BR]]
  tcp  0      0 127.0.0.1:631               0.0.0.0:*    LISTEN[[BR]]
  '''tcp  0      0 ::ffff:192.168.100.50:54310 :::*         LISTEN'''[[BR]]
  tcp  0      0 :::50090                    :::*         LISTEN[[BR]]
  tcp  0      0 :::50070                    :::*         LISTEN-~


== Problem: HRegionServers have lease issues on starting Hbase ==
 * HRegionServers connect initially, then drop off due to '''LeaseExpiredException'''
 * The following can be seen in '''hbase-{user}-master-masternode.log''', for example, '''hbase-hadoop-master-masternode.log'''
  ~-2007-12-04 14:35:06,690 INFO org.apache.hadoop.hbase.HMaster: received start message from:
127.0.0.1:60020[[BR]]
  2007-12-04 14:35:06,696 INFO org.apache.hadoop.hbase.HMaster: received start message from:
192.168.100.50:60020[[BR]]
  2007-12-04 14:35:06,697 INFO org.apache.hadoop.hbase.HMaster: received start message from:
127.0.0.1:60020[[BR]]
  2007-12-04 14:35:06,711 INFO org.apache.hadoop.hbase.HMaster: received start message from:
127.0.0.1:60020-~

=== Causes ===
 * The master node is having difficulty resolving the IP addresses of the HRegionServers (slave
nodes)
=== Resolution ===
 * Remove master node (node on which HMaster runs) from '''hbase/conf/regionservers'''

Mime
View raw message