cloudstack-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From weiz...@apache.org
Subject [30/50] [abbrv] git commit: updated refs/heads/disk_io_throttling to 8b8a0d3
Date Mon, 10 Jun 2013 17:27:24 GMT
meng's proposal for GSOC2013


Project: http://git-wip-us.apache.org/repos/asf/cloudstack/repo
Commit: http://git-wip-us.apache.org/repos/asf/cloudstack/commit/6e245422
Tree: http://git-wip-us.apache.org/repos/asf/cloudstack/tree/6e245422
Diff: http://git-wip-us.apache.org/repos/asf/cloudstack/diff/6e245422

Branch: refs/heads/disk_io_throttling
Commit: 6e2454228313c3373bfa96ef623a4b6ceb88d0ee
Parents: c8d607e
Author: kyrameng <menghan@ufl.edu>
Authored: Sun Jun 9 19:09:20 2013 -0400
Committer: Sebastien Goasguen <runseb@gmail.com>
Committed: Mon Jun 10 03:00:30 2013 -0400

----------------------------------------------------------------------
 docs/en-US/CloudStack_GSoC_Guide.xml |   2 +-
 docs/en-US/gsoc-meng.xml             | 235 ++++++++++++++++++++++++++++++
 2 files changed, 236 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/cloudstack/blob/6e245422/docs/en-US/CloudStack_GSoC_Guide.xml
----------------------------------------------------------------------
diff --git a/docs/en-US/CloudStack_GSoC_Guide.xml b/docs/en-US/CloudStack_GSoC_Guide.xml
index 243a0ca..1f43593 100644
--- a/docs/en-US/CloudStack_GSoC_Guide.xml
+++ b/docs/en-US/CloudStack_GSoC_Guide.xml
@@ -49,6 +49,6 @@
     <xi:include href="gsoc-tuna.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
     <xi:include href="gsoc-imduffy15.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
     <xi:include href="gsoc-dharmesh.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
-
+    <xi:include href="gsoc-meng.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
 </book>
 

http://git-wip-us.apache.org/repos/asf/cloudstack/blob/6e245422/docs/en-US/gsoc-meng.xml
----------------------------------------------------------------------
diff --git a/docs/en-US/gsoc-meng.xml b/docs/en-US/gsoc-meng.xml
new file mode 100644
index 0000000..1de259d
--- /dev/null
+++ b/docs/en-US/gsoc-meng.xml
@@ -0,0 +1,235 @@
+<?xml version='1.0' encoding='utf-8' ?>
+<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"
[
+<!ENTITY % BOOK_ENTITIES SYSTEM "CloudStack_GSoC_Guide.ent">
+%BOOK_ENTITIES;
+]>
+
+<!-- Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+ 
+   http://www.apache.org/licenses/LICENSE-2.0
+ 
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+<chapter id="gsoc-meng">
+        <title>Meng's 2013 GSoC Proposal</title>
+        <para>This chapter describes Meng's 2013 Google Summer of Code project within
the &PRODUCT; ASF project. It is a copy paste of the submitted proposal.</para>
+	<section id="Project-Description">
+		<title>Project Description</title>
+		<para>
+			Getting a hadoop cluster going can be challenging and painful due to the tedious configuration
phase and the diverse idiosyncrasies of each cloud provider. Apache Whirr<ulink url="http://whirr.apache.org/
"><citetitle>[1]</citetitle></ulink> and Provisionr is a set of libraries
for running cloud services in an automatic or semi-automatic fashion. They take advantage
of a cloud-neutral library called jclouds<ulink url=" http://www.jclouds.org/documentation/gettingstarted/what-is-jclouds/"><citetitle>[2]</citetitle></ulink>
to create one-click, auto-configuring hadoop clusters on multiple clouds. Since jclouds supports
CloudStack API, most of the services provided by Whirr and Provisionr should work out of the
box on CloudStack. My first task is to test that assumption, make sure everything is well
documented, and correct all issues with the latest version of CloudStack (4.0 and 4.1).
+		</para>
+		
+<para>
+The biggest challenge for hadoop provisioning is automatically configuring each instance
at launch time based on what it is supposed to do, a process known as contextualization<ulink
url="http://dl.acm.org/citation.cfm?id=1488934"><citetitle>[3]</citetitle></ulink><ulink
url="http://www.nimbusproject.org/docs/current/clouds/clusters2.html "><citetitle>[4]</citetitle></ulink>.
It causes last minute changes inside an instance to adapt to a cluster environment. Many automated
cloud services are enabled by contextualization. For example in one-click hadoop clusters,
contextualization basically amounts to generating and distributing ssh key pairs among instances,
telling an instance where the master node is and what other slave nodes it should be aware
of, etc. On EC2 contextualization is done via passing information through the EC2_USER_DATA
entry<ulink url="http://aws.amazon.com/amazon-linux-ami/ "><citetitle>[5]</citetitle></ulink><ulink
url="https://svn.apache.org/repos/asf/whirr/bra
 nches/contrib-python/src/py/hadoop/cloud/data/hadoop-ec2-init-remote.sh"><citetitle>[6]</citetitle></ulink>.
Whirr and Provisionr embrace this feature to provision hadoop instances on EC2. My second
task is to test and extend Whirr and Provisionr’s one-click solution on EC2 to CloudStack
and also improve CloudStack’s support for Whirr and Provisionr to enable hadoop provisioning
on CloudStack based clouds.
+</para>
+<para>
+My third task is to add a Query API  that is compatible with Amazon Elastic MapReduce (EMR)
to CloudStack. Through this API, all hadoop provisioning functionality will be exposed and
users can reuse cloud clients that are written for EMR to create and manage hadoop clusters
on CloudStack based clouds.
+</para>
+	</section>
+
+	<section id="Project-Details">
+		<title>Project Details</title>
+		<para>
+			Whirr defines four roles for the hadoop provisioning service: Namenode, JobTracker, Datanode
and TaskTraker. With the help of CloudInit<ulink url="https://help.ubuntu.com/community/CloudInit
"><citetitle>[7]</citetitle></ulink> (a popular package for cloud instance
initialization), each VM instance is configured based on its role and a compressed file that
is passed in the EC2_USER_DATA entry. Since CloudStack also supports EC2_USER_DATA, I think
the most feasible way to have hadoop provisioning on CloudStack is to extend Whirr’s solution
on EC2 to CloudStack platform and to make necessary adjustment based on CloudStack’s
+		</para>
+		
+		<para>
+		Whirr and Provisionr deal with two critical issues in their role configuration scripts
(configure-hadoop-role_list): SSH key authentication and hostname configuration.
+		</para>
+		<orderedlist>
+			<listitem><para>
+			SSH Key Authentication. The need for SSH Key based authentication is required so that
the master node can login to slave nodes to start/stop hadoop daemons. Also each node needs
to login to itself to start its own hadoop daemons. Traditionally this is done by generating
a key pair on the master node and distributing the public key to all slave nodes. This can
be only done with human intervention. Whirr works around this problem on EC2 by having a common
key pair for all nodes in a hadoop cluster. Thus every node is able to login to one another.
The key pair is provided by users and obtained by CloudInit inside an instance from metadata
service. As far as I know, Cloudstack does not support user-provided ssh key authentication.
Although CloudStack has the createSSHKeyPair API<ulink url="http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.0.2/html/Installation_Guide/using-sshkeys.html
"><citetitle>[8]</citetitle></ulink> to generate SSH keys and users can
create an instance
  template that supports SSH keys, there is no easy way to have a unified SSH key on all cluster
instances. Besides Whirr prefers minimal image management, so having a customized template
doesn’t seem quite fit here.
+			</para></listitem>
+			<listitem><para>
+			Hostname configuration. The hostname of each instance has to be properly set and injected
into the set of hadoop config files (core-site.xml, hdfs-site.xml, mapred-site.xml ). For
an EC2 instance, its host name is converted from a combination of its public IP and an EC2-specific
pre/suffix (e.g. an instance with IP 54.224.206.71 will have its hostname set to ec2-54-224-206-71.compute-1.amazonaws.com).
This hostname amounts to the Fully Qualified Domain Name that uniquely identifies this node
on the network.  As for the case of CloudStack, if users do not specify a name the hostname
that identifies a VM on a network will be a unique UUID generated by CloudStack<ulink url="https://cwiki.apache.org/CLOUDSTACK/allow-user-provided-hostname-internal-vm-name-on-hypervisor-instead-of-cloud-platform-auto-generated-name-for-guest-vms.html"><citetitle>[9]</citetitle></ulink>.
+
+
+
+			</para></listitem>
+			</orderedlist>
+			<para>
+			These two are the main issues that need support improvement on the CloudStack side. Other
things like preparing disks, installing hadoop tarballs and starting hadoop daemons can be
easily done as they are relatively role/instance-independent and static. Runurl can be used
to simplify user-data scripts.
+
+
+
+			</para>
+			<para>
+			After we achieve hadoop provisioning on CloudStack using Whirr we can go further to add
a Query API to CloudStack to expose this functionality. I will write an API that is compatible
with Amazon Elastic MapReduce Service (EMR)<ulink url="http://docs.aws.amazon.com/ElasticMapReduce/latest/API/Welcome.html
"><citetitle>[10]</citetitle></ulink> so that users can reuse clients
that are written for EMR to submit jobs to existing hadoop clusters, poll job status, terminate
a hadoop instance and do other things on CloudStack based clouds.  There are eight actions<ulink
url="http://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_Operations.html "><citetitle>[11]</citetitle></ulink>
now supported in EMR API. I will try to implement as many as I can during the period of GSoC.
The following statements give some examples of the API that I will write.
+			</para>
+			<programlisting><![CDATA[
+    https://elasticmapreduce.cloudstack.com?Action=RunJobFlow &Name=MyJobFlowName &Instances.MasterInstanceType=m1.small
&Instances.SlaveInstanceType=m1.small &Instances.InstanceCount=4
+]]></programlisting>
+<para>
+This will launch a new hadoop cluster with four instances using specified instance types
and add a job flow to it.
+</para>
+<programlisting><![CDATA[
+https://elasticmapreduce.cloudstack.com?Action=AddJobFlowSteps &JobFlowId=j-3UN6WX5RRO2AG
&Steps.member.1.Name=MyStep2 &Steps.member.1.HadoopJarStep.Jar=MyJar
+]]></programlisting>
+<para>
+This will add a step to the existing job flow with ID j-3UN6WX5RRO2AG. This step will run
the specified jar file.
+</para>
+<programlisting><![CDATA[
+https://elasticmapreduce.cloudstack.com?Action=DescribeJobFlows &JobFlowIds.member.1=j-3UN6WX5RRO2AG
+]]></programlisting>
+<para>
+This will return the status of the given job flow.
+</para>
+	</section>
+
+	<section id="Roadmap">
+		<title>Roadmap</title>
+		
+		<para><emphasis role="bold">Jun. 17 ∼ Jun. 30</emphasis> </para>
+		<orderedlist>
+		<listitem><para>
+		Learn CloudStack and Apache Whirr/Provisionr APIs; Deploy a CloudStack cluster.
+		</para></listitem>
+		
+		<listitem><para>
+		Identify how EC2_USER_DATA is passed and executed on each CloudStack instance.
+		</para></listitem>
+		<listitem><para>
+		Figure out how the files passed in EC2_USER_DATA are acted upon by CloudInit.
+		</para></listitem>
+		<listitem><para>
+		Identify files in /etc/init/ that are used or modified by Whirr and Provisionr for hadoop
related configuration.
+		</para></listitem>
+		<listitem><para>
+		Deploy a hadoop cluster on CloudStack via Whirr/Provisionr. This is to test what are missing
in CloudStack or Whirr/Provisionr in terms of their support for each other.
+		</para></listitem>
+		</orderedlist>
+		<para><emphasis role="bold">Jul. 1∼ Aug. 1</emphasis> </para>
+		<orderedlist>
+		<listitem><para>
+		Write scripts to configure VM hostname on CloudStack with the help of CloudInit;
+		</para></listitem>
+		<listitem><para>
+		Write scripts to distribute SSH keys among CloudStack instances. Add the capability of
using user-provided ssh key for authentication to CloudStack.
+		</para></listitem>
+		<listitem><para>
+		Take care of the other things left for hadoop provisioning, such as mounting disks, installing
hadoop tarballs, etc.
+		</para></listitem>
+		<listitem><para>
+		Compose files that need to be passed in EC2_USER_DATA to each CloudStack instance . Test
these files and write patches to make sure that Whirr/Provisionr can succefully deploy one-click
hadoop clusters on CloudStack.
+		</para></listitem>
+		</orderedlist>
+		<para><emphasis role="bold">Aug. 3 ∼ Sep. 8</emphasis> </para>
+		<orderedlist>
+		<listitem><para>
+		Design and build an Elastic Mapreduce API for CloudStack that takes control of hadoop cluster
creation and management.
+		</para></listitem>
+		<listitem><para>
+		Implement the eight actions defined in EMR API. This task might take a while.
+		</para></listitem>
+		
+		</orderedlist>
+		<para><emphasis role="bold">Sep. 10 ∼ Sep. 23</emphasis> </para>
+		<orderedlist>
+		<listitem><para>
+		
+    Code cleaning and documentation wrap up.
+
+		</para></listitem>
+		
+		</orderedlist>
+		
+		
+	</section>
+
+	<section id="Deliverables-meng">
+		<title>Deliverables</title>
+		<orderedlist>
+		<listitem><para>
+		
+ Whirr has limited support for CloudStack. Check what’s missing and make sure all steps
are properly documented on the Whirr and CloudStack websites.
+		</para></listitem>
+		<listitem><para>
+		Contribute code to CloudStack and and send patches to Whirr/Provisionr if necessary to
enable hadoop provisioning on CloudStack via Whirr/Provisionr.
+		</para></listitem>
+		<listitem><para>
+		Build an  EMR-compatible API for CloudStack.
+		</para></listitem>
+		</orderedlist>
+		</section>
+			<section id="Nice-to-have">
+		<title>Nice to have</title>
+		<para>In addition to the required deliverables, it’s nice to have the following:</para>
+		<orderedlist>
+		<listitem><para>
+		
+ The capability to add and remove hadoop nodes dynamically to enable elastic hadoop clusters
on CloudStack.
+
+		</para></listitem>
+		<listitem><para>
+		A review of the existing tools that offer one-click provisioning and make sure that they
support CloudStack based clouds.
+		</para></listitem>
+		</orderedlist>
+	</section>
+
+			<section id="References">
+		<title>References</title>
+		
+		<orderedlist>
+		<listitem><para>
+		
+ http://whirr.apache.org/
+		</para></listitem>
+		<listitem><para>
+		http://www.jclouds.org/documentation/gettingstarted/what-is-jclouds/
+		</para></listitem>
+		<listitem><para>
+		Katarzyna Keahey, Tim Freeman, Contextualization: Providing One-Click Virtual Clusters
+		</para></listitem>
+		<listitem><para>
+		http://www.nimbusproject.org/docs/current/clouds/clusters2.html
+		</para></listitem>
+		<listitem><para>
+		http://aws.amazon.com/amazon-linux-ami/
+		</para></listitem>
+		<listitem><para>
+		https://svn.apache.org/repos/asf/whirr/branches/contrib-python/src/py/hadoop/cloud/data/hadoop-ec2-init-remote.sh
+		</para></listitem>
+		<listitem><para>
+		https://help.ubuntu.com/community/CloudInit
+		</para></listitem>
+		<listitem><para>
+		http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.0.2/html/Installation_Guide/using-sshkeys.html
+		</para></listitem>
+		<listitem><para>
+		https://cwiki.apache.org/CLOUDSTACK/allow-user-provided-hostname-internal-vm-name-on-hypervisor-instead-of-cloud-platform-auto-generated-name-for-guest-vms.html
+		</para></listitem>
+		<listitem><para>
+http://docs.aws.amazon.com/ElasticMapReduce/latest/API/Welcome.html
+		</para></listitem>
+		<listitem><para>
+		http://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_Operations.html
+		</para></listitem>
+		<listitem><para>
+		http://buildacloud.org/blog/235-puppet-and-cloudstack.html
+		</para></listitem>
+		<listitem><para>
+http://chriskleban-internet.blogspot.com/2012/03/build-cloud-cloudstack-instance.html
+		</para></listitem>
+		<listitem><para>
+		http://gehrcke.de/2009/06/aws-about-api/
+		</para></listitem>
+		<listitem><para>
+		Apache_CloudStack-4.0.0-incubating-API_Developers_Guide-en-US.pdf
+		</para></listitem>
+		
+		</orderedlist>
+	</section>
+	
+</chapter>


Mime
View raw message