hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Virtual Hadoop" by LukeLu
Date Fri, 07 Jun 2013 03:49:49 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Virtual Hadoop" page has been changed by LukeLu:
https://wiki.apache.org/hadoop/Virtual%20Hadoop?action=diff&rev1=12&rev2=13

  
  Ignoring low-level networking/clock issues, what does this mean? (Only valid for some cloud
vendors, it may be different for other cloud vendors or you own your virtualized infrastructure.)
  
-  1. When you request a VM, it's performance may vary from previous requests (when lack of
isolation feature/policy). This can be due to CPU differences, or the other workloads.
+  1. When you request a VM, it's performance may vary from previous requests (when missing
isolation feature/policy). This can be due to CPU differences, or the other workloads.
   1. There is no point writing topology scripts, if cloud vendor doesn't expose physical
topology to you in some way. OTOH, [http://serengeti.cloudfoundry.com/ Project Serengeti]
configures the topology script automatically for vSphere.
-  1. All network ports must be closed by way of firewall and routing information, apart from
those ports critical for Hadoop -which must then run with security on.
+  1. All network ports must be closed by way of firewall and routing information, apart from
those ports critical for Hadoop, which must then run with security on.
   1. All data you wish to keep must be kept on permanent storage: mounted block stores, remote
filesystems or external databases. This goes for both input and output.
   1. People or programs need to track machine failures and react to them by releasing those
machines and requesting new ones.
   1. If the cluster is idle. some machines can be decommissioned.
@@ -104, +104 @@

     * HDFS is as reliable and efficient as in physical.
     * Virtualization can provide much higher hardware utilization by consolidating multiple
Hadoop clusters and other workload on the same physical cluster
     * Higher performance for some workload (including terasort) than physical for typical
2 CPU socket Hadoop nodes due to better NUMA and disk scheduling
-    * Per tenant VLAN via SDN for better security than typical shared physical Hadoop cluster

+    * Per tenant VLAN (VXLAN) for better security than typical shared physical Hadoop cluster

   * Given the choice between a virtual Hadoop and no Hadoop, virtual Hadoop is compelling.
   * Using Apache Hadoop as your MapReduce infrastructure gives you Cloud vendor independence,
and the option of moving to a permanent physical deployment later.
   * It is the only way to execute the tools that work with Hadoop and the layers above it
in a Cloud environment.
@@ -129, +129 @@

  
  == Summary ==
  
- You can bring up Hadoop in virtualized infrastructures. Sometimes it even makes sense for
public cloud, for development and production. For production use, be aware that the differences
between physical and virtual infrastructures could pose additional gotchas to your data integrity
and security without proper planning and provisioning. 
+ You can bring up Hadoop in virtualized infrastructures with many benefits. Sometimes it
even makes sense for public cloud, for development and production. For production use, be
aware that the differences between physical and virtual infrastructures could pose additional
gotchas to your data integrity and security without proper planning and provisioning. 
  

Mime
View raw message