cloudstack-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From seb...@apache.org
Subject git commit: updated refs/heads/master to 2238e2b
Date Mon, 29 Jul 2013 15:26:34 GMT
Updated Branches:
  refs/heads/master 4054a8e2a -> 2238e2bbc


[GSOC] Meng's mid-term report

Signed-off-by: Sebastien Goasguen <runseb@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/cloudstack/repo
Commit: http://git-wip-us.apache.org/repos/asf/cloudstack/commit/2238e2bb
Tree: http://git-wip-us.apache.org/repos/asf/cloudstack/tree/2238e2bb
Diff: http://git-wip-us.apache.org/repos/asf/cloudstack/diff/2238e2bb

Branch: refs/heads/master
Commit: 2238e2bbc5863ce926341b2d91fd4bc8bca1c873
Parents: 4054a8e
Author: Meng Han <menghan@ufl.edu>
Authored: Mon Jul 29 11:26:17 2013 -0400
Committer: Sebastien Goasguen <runseb@gmail.com>
Committed: Mon Jul 29 11:26:17 2013 -0400

----------------------------------------------------------------------
 docs/en-US/gsoc-midsummer-meng.xml           | 196 +++++++++++++++++++++-
 docs/en-US/images/clusterDefinition.png      | Bin 0 -> 52607 bytes
 docs/en-US/images/launchHadoopClusterApi.png | Bin 0 -> 13427 bytes
 docs/en-US/images/launchHadoopClusterCmd.png | Bin 0 -> 83972 bytes
 docs/en-US/images/whirrDependency.png        | Bin 0 -> 10794 bytes
 docs/en-US/images/whirrOutput.png            | Bin 0 -> 61831 bytes
 6 files changed, 192 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/cloudstack/blob/2238e2bb/docs/en-US/gsoc-midsummer-meng.xml
----------------------------------------------------------------------
diff --git a/docs/en-US/gsoc-midsummer-meng.xml b/docs/en-US/gsoc-midsummer-meng.xml
index 1ab07cb..ee24cf4 100644
--- a/docs/en-US/gsoc-midsummer-meng.xml
+++ b/docs/en-US/gsoc-midsummer-meng.xml
@@ -11,9 +11,9 @@
  to you under the Apache License, Version 2.0 (the
  "License"); you may not use this file except in compliance
  with the License.  You may obtain a copy of the License at
- 
+
    http://www.apache.org/licenses/LICENSE-2.0
- 
+
  Unless required by applicable law or agreed to in writing,
  software distributed under the License is distributed on an
  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -23,6 +23,194 @@
 -->
 
 <section id="gsoc-midsummer-meng">
-    <title>Mid-Summer Progress Updates</title>
-    <para>This section describes ...</para>
+    <title>Mid-Summer Progress Updates for Meng - "Hadoop Provisioning on Cloudstack
Via Whirr"</title>
+	<para></para>
+    <para>In this section I describe my progress with the project titled "Hadoop Provisioning
on CloudStack Via Whirr"</para>
+    <section id="introduction-meng">
+        <title>Introduction</title>
+        <para>
+            It has been five weeks since the GSOC 2013 is kick-started. During the last five
weeks I have been constantly learning from the CloudStack Community in aspects of both knowledge
and personality. The whole community is very accommodating and willing to help newbies. I
am making progress steadily with the community's help. This is my first experience working
with such a large and cool code base, definitely a challenging and wonderful experience for
me. Though I am a little slipped behind my schedule, I am making my best effort and hoping
to complete what I set out in my proposal by the end of this summer.
+        </para>
+        <para>
+            
+        </para>
+    </section>
+<section id="cloudstack-installation-meng">
+        <title>CloudStack Installation</title>
+        <para>
+           I spent two weeks or so on the CloudStack Installation. In the beginning, I am
using the Ubuntu systems. Given that I am not familiar with maven and a little scared by various
kinds of errors and exceptions during system deployment, I failed to deploy CloudStack through
building from the source. With Ian's advice, I switched to CentOS and began to use rpm packages
for installation, things went much smoother. By the end of the second week, I submitted my
first patch -- CloudStack_4.1_Quick_Install_Guide.
+        </para>
+       
+    </section>
+    <section id="whirrr-installation">
+        <title>Deploying a Hadoop Cluster on CloudStack via Whirr</title>
+        
+         <para>   Provided that CloudStack is in place and I can register templates
and add instances, I went ahead to use Whirr to deploy a hadoop cluster on CloudStack. The
cluster definition file is as follows:</para>
+ <mediaobject>
+            <imageobject>
+                <imagedata fileref="./images/clusterDefinition.png"/>
+            </imageobject>
+           
+        </mediaobject>
+
+<para> <emphasis role="bold">whirr.cluster-name:     </emphasis>the name
of your hadoop cluster.</para>
+<para> <emphasis role="bold">whirr.store-cluster-in-etc-hosts:     </emphasis>store
all cluster IPs and hostnames in /etc/hosts on each node.</para>
+<para> <emphasis role="bold">         whirr.instance-templates:     </emphasis>
this specifies your cluster layout. One node acts as the jobtracker and namenode (the hadoop
master). Another two slaves nodes act as both datanode and tasktracker.</para>
+<para> <emphasis role="bold">   image-id:           </emphasis> This tells
CloudStack which template to use to start the cluster. </para>
+<para> <emphasis role="bold">         hardware-id:     </emphasis> This
is the type of hardware to use for the cluster instances. 
+</para>
+<para> <emphasis role="bold">         private/public-key-file:     </emphasis>:the
key-pair used to login to each instance. Only RSA SSH keys are supported at this moment. Jclouds
will move this key pair to the set of instances on startup. </para>
+<para> <emphasis role="bold">         whirr.cluster-user:     </emphasis>this
is the name of the cluster admin user.</para>
+<para> <emphasis role="bold">         whirr.bootstrap-user:     </emphasis>this
tells Jclouds which user name and password to use to login to each instance for bootstrapping
and customizing each instance. You must specify this property if the image you choose has
a hardwired username/password.(e.g. the default template CentOS 5.5(64-bit) no GUI (KVM) comes
with Cloudstack has a hardcoded credential: root:password), otherwise you don't need to specify
this property.</para>
+<para> <emphasis role="bold">         whirr.env.repo:     </emphasis>this
tells Whirr which repository to use to download packages.</para>
+<para> <emphasis role="bold">         whirr.hadoop.install-function/whirr.hadoop.configure-function
    </emphasis>  :it's self-explanatory.</para>
+
+        
+      
+        <para>
+            Output of this deployment is as follows:
+        </para>
+ <mediaobject>
+            <imageobject>
+                <imagedata fileref="./images/whirrOutput.png"/>
+            </imageobject>
+          
+        </mediaobject>
+       
+        <para>
+             Other details can be found at  <ulink url="http://kyrameng.blogspot.com/2013/07/how-to-use-whirr-to-start-hadoop.html"><citetitle>this
post</citetitle></ulink> in my blog. In addition I have a Whirr trouble shooting
<ulink url="http://kyrameng.blogspot.com/2013/07/whirr-trouble-shooting.html"><citetitle>post</citetitle></ulink>
there  if you are interested.
+        </para>
+    </section>
+    <section id="emr-plugin-implementation">
+        <title>Elastic Map Reduce(EMR) Plugin Implementation</title>
+        <para>
+           Given that I have completed the deployment of a hadoop cluster on CloudStack using
Whirr through the above steps, I began to dive into the EMR plugin development. My first API
is launchHadoopCluster, it's implementation is quite straight forward, by invoking an external
Whirr command in the command line on the management server and piggybacking the Whirr output
in responses.This api has a structure like below:    </para>
+	<mediaobject>
+            <imageobject>
+                <imagedata fileref="./images/launchHadoopClusterApi.png"/>
+            </imageobject>       
+        </mediaobject>
+<para>The following is the source code of launchHadoopClusterCmd.java.</para>
+<mediaobject>
+            <imageobject>
+                <imagedata fileref="./images/launchHadoopClusterCmd.png"/>
+            </imageobject>
+          
+        </mediaobject>
+ <para>You can invoke this api through the following command in CloudMonkey:</para>
+ <programlisting>> launchHadoopCluster config=myhadoop.properties</programlisting>
+<para>  </para>
+<para>This is sort of the launchHadoopCluster 0.0, other details can be found in this
<ulink url="http://kyrameng.blogspot.com/2013/07/cloudstack-emr-api-developement-series.html"><citetitle>post</citetitle></ulink>
.</para>
+<para>
+My undergoing working is modifying this api so that it calls Whirr libraries instead of invoking
Whirr externally in the command line.</para>
+<para>First add Whirr as a dependency of this plugin so that maven will download Whirr
automatically when you compile this plugin.</para>
+<mediaobject>
+            <imageobject>
+                <imagedata fileref="./images/whirrDependency.png"/>
+            </imageobject>
+          
+        </mediaobject>
+
+<para>I am planning to replace the Runtime.getRuntime().exec() above  with the following
code snippet.</para>
+<programlisting language="Java">
+ LaunchClusterCommand command = new LaunchClusterCommand();
+ command.run(System.in, System.out, System.err, Arrays.asList(args));
+</programlisting>
+<para></para>
+<para>Eventually when a hadoop cluster is launched. We can use Yarn to submit hadoop
jobs. 
+Yarn exposes the following API for job submission.</para>
+<programlisting>ApplicationId submitApplication(ApplicationSubmissionContext appContext)
throws org.apache.hadoop.yarn.exceptions.YarnRemoteException</programlisting>
+<para>In Yarn, an application is either a single job in the classical sense of Map-Reduce
or a DAG of jobs. In other words an application can have many jobs. This fits well with the
concepts in EMR design. The term job flow  in EMR is equivalent to the application concept
in Yarn. Correspondingly, a job flow step in EMR is equal to a job in Yarn. In addition Yarn
exposes the following API to query the state of an application.</para>
+<programlisting>ApplicationReport getApplicationReport(ApplicationId appId) throws
org.apache.hadoop.yarn.exceptions.YarnRemoteException</programlisting>
+<para>The above API can be used to implement the DescribeJobFlows API in EMR. </para>
+
+    
+              
+      
+    </section>
+  <section id="learning-jclouds">
+  <title>Learning Jclouds</title>
+<para>As Whirr relies on Jclouds for clouds provisioning, it's important for me to
understand what Jclouds features support Whirr and how Whirr interacts with Jclouds. I figured
out the following problems:</para>
+<itemizedlist>
+<listitem><para>How does Whirr create user credentials on each node? </para>
+<para>
+Using the runScript feature provide by Jclouds, Whirr can execute a script at node bootup,
one of the options in the script is to override the login credentials with the ones that provide
in the cluster properties file. The following line from Whirr demonstrates this idea.
+<programlisting language="Java">final RunScriptOptions options = overrideLoginCredentials(LoginCredentials.builder().user(clusterSpec.getClusterUser()).privateKey(clusterSpec.getPrivateKey()).build());
+</programlisting>
+</para><para> </para>
+</listitem>
+<listitem><para>How does Whirr start up instances in the beginning? </para>
+<para>The computeService APIs provided by jclouds allow Whirr to create a set of nodes
in a group(specified by the cluster name),and operate them as a logical unit without worrying
about the implementation details of the cloud. </para>
+<programlisting language="Java">Set&lt;NodeMetadata&gt; nodes = (Set&lt;NodeMetadata&gt;)computeService.createNodesInGroup(clusterName,
num, template);
+</programlisting><para> </para><para>The above command returns all
the nodes the API was able to launch into in a running state with port 22 open.</para></listitem>
+<listitem><para>How does Whirr differentiate nodes by roles and configure them
separately? </para>
+<para>Jclouds commands ending in Matching are called predicate commands. They allow
Whirr to decide which subset of nodes these commands will affect. For example, the following
command in Whirr will run a script with specified options on nodes who match the given condition.</para>
+<programlisting language="Java"> 
+Predicate&lt;NodeMetadata&gt; condition;
+condition = Predicates.and(runningInGroup(spec.getClusterName()), condition);
+ComputeServiceContext context = getCompute().apply(spec);
+context.getComputeService().runScriptOnNodesMatching(condition,statement, options);
+</programlisting>
+<para>The following is an example how a node playing the role of jobtracker in a hadoop
cluster is configured to open certain ports using the predicate commands.</para>
+<programlisting language="Java">
+    Instance jobtracker = cluster.getInstanceMatching(role(ROLE)); //  ROLE="hadoop-jobtracker"
+    event.getFirewallManager().addRules(
+        Rule.create()
+          .destination(jobtracker)
+          .ports(HadoopCluster.JOBTRACKER_WEB_UI_PORT),
+        Rule.create()
+          .source(HadoopCluster.getNamenodePublicAddress(cluster).getHostAddress())
+          .destination(jobtracker)
+          .ports(HadoopCluster.JOBTRACKER_PORT)
+    );
+
+</programlisting>
+<para> </para>
+<para>With the help of such predicated commands, Whirr can run different bootstrap
and init scripts on nodes with distinct roles.</para>
+
+</listitem>
+
+
+</itemizedlist>
+
+</section>
+    <section id="Lessons">
+        <title>Great Lessons Learned</title>
+        <para>
+            I am much appreciated with the opportunity to work with CloudStack and learn
from the lovable community. I can see myself constantly evolving from this invaluable experience
both technologically and psychologically. There were hard times that I were stuck on certain
problems for days and good times that made me want to scream seeing problem cleared. This
project is a great challenge for me. I am making progress steadily though not smoothly. That's
where I learned the following great lessons:
+	 
+    
+        </para>
+        <itemizedlist>
+            <listitem>
+                <para>When you work in an open source community, do things in the open
source way. There was a time when I locked myself up because I am stuck on problems and I
am not confident enough to ask them on the mailing list. The more I restricted myself from
the community the less progress I made. Also the lack of communication from my side also prevents
me from learning from other people and get guidance from my mentor.</para>
+            </listitem>
+            <listitem>
+                <para>CloudStack is evolving at a fast pace. There are many APIs being
added ,many patches being submitted every day. That's why the community use the word "SNAPSHOT"
for each version. At this moment I am learning to deal with fast code changing and upgrading.
A large portion of my time is devoted to system installation and deployment. I am getting
used to treat system exceptions and errors as a common case. That's another reason why communication
with the community is critical. </para>
+
+            </listitem>
+
+            <listitem>
+                <para>In addition to the project itself, I am strengthening my technical
suite at the same time. </para>
+
+<itemizedlist>
+<listitem><para>I learned to use some useful software tools: maven, git, publican,
etc.</para></listitem>
+            <listitem>
+                <para>
+Reading the source code of Whirr make me learn more high level java programming skills, e.g.
using generics, wildcard, service loader, the Executor model, Future object, etc .</para>
+            </listitem>
+            <listitem>
+                <para>I am exposed to Jclouds, a useful cloud neutral library to manipulate
different cloud infrastructures.</para>
+            </listitem>
+         <listitem><para>I gained deeper understanding of cloud web services
and learned the usage of several cloud clients, e.g. Jclouds CLI, CloudMonkey,etc.</para></listitem>
+        </itemizedlist>
+    
+
+            </listitem>
+          
+ 
+        </itemizedlist>
+             
+        <para>I am grateful that Google Summer Of Code exists, it gives us students
a sense of how fast real-world software development works and provides us hand-on experience
of coding in large open source projects. More importantly it's a self-challenging process
that strengthens our minds along the way.</para>
+    </section>
 </section>

http://git-wip-us.apache.org/repos/asf/cloudstack/blob/2238e2bb/docs/en-US/images/clusterDefinition.png
----------------------------------------------------------------------
diff --git a/docs/en-US/images/clusterDefinition.png b/docs/en-US/images/clusterDefinition.png
new file mode 100644
index 0000000..6170f9f
Binary files /dev/null and b/docs/en-US/images/clusterDefinition.png differ

http://git-wip-us.apache.org/repos/asf/cloudstack/blob/2238e2bb/docs/en-US/images/launchHadoopClusterApi.png
----------------------------------------------------------------------
diff --git a/docs/en-US/images/launchHadoopClusterApi.png b/docs/en-US/images/launchHadoopClusterApi.png
new file mode 100644
index 0000000..6f94c74
Binary files /dev/null and b/docs/en-US/images/launchHadoopClusterApi.png differ

http://git-wip-us.apache.org/repos/asf/cloudstack/blob/2238e2bb/docs/en-US/images/launchHadoopClusterCmd.png
----------------------------------------------------------------------
diff --git a/docs/en-US/images/launchHadoopClusterCmd.png b/docs/en-US/images/launchHadoopClusterCmd.png
new file mode 100644
index 0000000..66a0c75
Binary files /dev/null and b/docs/en-US/images/launchHadoopClusterCmd.png differ

http://git-wip-us.apache.org/repos/asf/cloudstack/blob/2238e2bb/docs/en-US/images/whirrDependency.png
----------------------------------------------------------------------
diff --git a/docs/en-US/images/whirrDependency.png b/docs/en-US/images/whirrDependency.png
new file mode 100644
index 0000000..acdec78
Binary files /dev/null and b/docs/en-US/images/whirrDependency.png differ

http://git-wip-us.apache.org/repos/asf/cloudstack/blob/2238e2bb/docs/en-US/images/whirrOutput.png
----------------------------------------------------------------------
diff --git a/docs/en-US/images/whirrOutput.png b/docs/en-US/images/whirrOutput.png
new file mode 100644
index 0000000..7c3b512
Binary files /dev/null and b/docs/en-US/images/whirrOutput.png differ


Mime
View raw message