hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "VirtualCluster" by SteveLoughran
Date Wed, 24 Jun 2009 10:16:30 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/VirtualCluster

The comment on the change is:
more troublespots. 

------------------------------------------------------------------------------
   i. All machine's(both VM's and physical machines) public key are distributed to all "~/.ssh/authorized_keys"
file.
   i. conf/hadoop-site.xml file is similar for all the machines.
   i. /etc/hosts file must contain all the machines(VM,Physical machine) IP and Hostname.
-  i. The local hostname entry in /etc/hosts must not point to 127.0.0.1 or any other loopback
address (some laptop-friendly Unix distributions do this). It should be to the assigned IP
address.
+  i. The local hostname entry in /etc/hosts must not point to 127.0.0.1 or any other loopback
address (some laptop-friendly Linux distributions do this). It should be to the assigned IP
address.
   i. conf/slaves must contain the hostname of all slaves including VM's and physical machine.
   i. conf/masters must contain only master's hostname.
   i. both conf/masters and conf/slaves files must be similar in all the participating machines.
@@ -28, +28 @@

  Here are things that can cause trouble.
   1. Multiple virtual network adapters. It is simpler with one network adapter/node
   1. Machines changing hostname/IPAddress on a reboot. For a long-lived virtual cluster you
need stable machine names.
+  1. Machines whose hostname doesn't match the hostname the network assigns it. It thinks
it is "granton", the network thinks it is "dhcp-169-45", that being the name everything else
talks to it by.
+  1. Machines that think they have the same hostname. You get this if you clone VMs and don't
rename them.
   1. Pauses of an entire VM for 5-10s or longer. This happens when the virtual host is overloaded
and your VM has been swapped out. Host less VMs, or have them ask for less memory.
+  1. Wierd clock drift where it can even run backwards. Again, don't overload your machines.
+  1. All redundant virtual servers (e.g. Namenode and secondary NN) being hosted on the same
physical machine. At that point, you don't have redundancy or failover any more.
  

Mime
View raw message