hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Virtual Hadoop" by RobertMarshall
Date Fri, 20 Jul 2012 00:12:25 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Virtual Hadoop" page has been changed by RobertMarshall:
http://wiki.apache.org/hadoop/Virtual%20Hadoop?action=diff&rev1=6&rev2=7

  
  A recurrent question on the user mailing lists is "can Hadoop be deployed in virtual infrastructures",
or "can you run Hadoop 'in the Cloud'", where cloud means "separate storage and compute services,
public or private".
  
- The answer is "yes, but you need to be aware of the limitations"
+ These are actually two separate, but interrelated, questions, since every cloud infrastructure
depends upon virtualization to manage and present an aggregation of infrastructure components
that can be quickly configured to meet a user's need. Cloud and virtualization need to be
examined separately, but in all cases the answer is "Yes you can virtualize, and yes, you
can deploy to the cloud, but you need to know the consequences and plan accordingly".
+ 
+ First, some definitions and background:
+ 
+ Virtualization is the process of converting from a purely physical implementation to one
using a hypervisor (examples include VMware's ESXi and the Xen hypervisor) which abstract
the underlying physical hardware and provide an idealized, or virtual, implementation upon
which some higher-level services and/or implementations can be designed and built. Once a
physical cluster is virtualized, then higher level services, such as cloning a data node,
or providing high-availability to a specific node, or providing user controlled provisioning,
can be built.
+ 
+ A Private Cloud is a collection of virtualized physical hardware that has added services
such as catalogs of software or defined platforms that a customer can control. A private cloud
differs from a public cloud in that it is generally owned and or managed by the same company
or group as the customer. As an example, if I am responsible for a 100-node physical cluster,
and I need to share it between Sales and Marketing that wish to perform advanced analytics
with Hadoop and Engineering that wants to perform modelling of a new production plant, with
each getting 50% of the capacity, I could virtualize the physical architecture and allow the
pool of capacity to be shared between the competing groups, perhaps on a shared capacity or
a swap-in/swap-out basis.
+ 
+ A Public Cloud is like a Private Cloud but owned and/or managed by an outside entity, for
example Amazon Elastic Web Services. Private Clouds can provide cost benefits, either because
you only pay for your use, or you can have other's pay for their use, but at the loss of control
of ownership of resources and/or data. Not being able to prove control of custody of some
types of data might be a legal liability for certain types of data or industries (PCI, HIPAA).
+ 
  
  == Strengths of VM-hosted Hadoop ==
  

Mime
View raw message