Mailing-List: contact ambari-commits-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: ambari-dev@incubator.apache.org
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Subject: svn commit: r1180792 - /incubator/ambari/trunk/src/site/apt/index.apt
Date: Mon, 10 Oct 2011 06:41:36 -0000
To: ambari-commits@incubator.apache.org
From: omalley@apache.org
Message-Id: <20111010064136.2804123889D7@eris.apache.org>

Author: omalley
Date: Mon Oct 10 06:41:35 2011
New Revision: 1180792

URL: http://svn.apache.org/viewvc?rev=1180792&view=rev
Log:
AMBARI-13. Another iteration of the website (omalley)

Modified:
    incubator/ambari/trunk/src/site/apt/index.apt

Modified: incubator/ambari/trunk/src/site/apt/index.apt
URL: http://svn.apache.org/viewvc/incubator/ambari/trunk/src/site/apt/index.apt?rev=1180792&r1=1180791&r2=1180792&view=diff
==============================================================================
--- incubator/ambari/trunk/src/site/apt/index.apt (original)
+++ incubator/ambari/trunk/src/site/apt/index.apt Mon Oct 10 06:41:35 2011
@@ -15,25 +15,25 @@
 ~~
 Introduction
 
-  Ambari is a monitoring, administration and lifecycle management project
-  for Apache Hadoop clusters. Hadoop clusters require many inter-related
-  components that must be installed, configured, and managed across the
-  entire cluster. The stack of components that are currently supported by
-  Ambari includes:
+  Apache Ambariâ¢ is a monitoring, administration and lifecycle
+  management project for Apache Hadoopâ¢ clusters. Hadoop clusters
+  require many inter-related components that must be installed,
+  configured, and managed across the entire cluster. The set of
+  components that are currently supported by Ambari includes:
 
-  * {{{http://hbase.apache.org} Apache HBase}}
+  * {{{http://hbase.apache.org} Apache HBaseâ¢}}
 
-  * {{{http://incubator.apache.org/hcatalog} Apache HCatalog}}
+  * {{{http://incubator.apache.org/hcatalog} Apache HCatalogâ¢}}
 
-  * {{{http://hadoop.apache.org/hdfs} Apache Hadoop HDFS}}
+  * {{{http://hadoop.apache.org/hdfs} Apache Hadoop HDFSâ¢}}
 
-  * {{{http://hive.apache.org} Apache Hive}}
+  * {{{http://hive.apache.org} Apache Hiveâ¢}}
 
-  * {{{http://hadoop.apache.org/mapreduce} Apache Hadoop MapReduce}}
+  * {{{http://hadoop.apache.org/mapreduce} Apache Hadoop MapReduceâ¢}}
 
-  * {{{http://pig.apache.org} Apache Pig}}
+  * {{{http://pig.apache.org} Apache Pigâ¢}}
 
-  * {{{http://zookeeper.apache.org} Apache Zookeeper}}
+  * {{{http://zookeeper.apache.org} Apache Zookeeperâ¢}}
 
   []
 
@@ -46,7 +46,7 @@ Introduction
 
     * Assign roles to particular nodes or let Ambari pick a mapping for them.
 
-    * Override the stack's default versions of components or configure 
+    * Override the default versions of components or configure 
     particular values.
 
   * Upgrade a cluster
@@ -67,108 +67,152 @@ Introduction
 
   []
 
+  Ambari provides a REST, command line, and graphical interface. The command 
+  line and graphical interface are implemented using the REST interface and 
+  all three have the same functionality. The graphical interface is 
+  browser-based using JSON and JavaScript. 
+
   Ambari requires that the base operating system has been deployed and
   managed via existing tools, such as Chef or Puppet. Ambari is solely focused
-  on simplifying configuring and managing the Hadoop stack.
+  on simplifying configuring and managing the Hadoop stack. Ambari does support
+  adding third party software packages to be deployed as part of the Hadoop 
+  cluster.
 
 Key concepts
 
-  * <<Nodes>> are machines in the datacenter that will run Hadoop and
-  be managed by Ambari.
+  * <<Nodes>> are machines in the datacenter that are managed by Ambari to
+  run Hadoop clusters.
 
   * <<Components>> are the individual software products that are
   installed to create a complete Hadoop cluster. Some components
-  include servers, such as HDFS and some are passive, such as Pig.
-
-  * <<Components>> consist of roles, which differentiate how each node
-  is supporting the component. The roles for HDFS include NameNode,
-  Secondary NameNode, slave, and gateway. Some roles, such as the
-  NameNode are active and run servers, while other roles such gateway
-  are passive. All roles of a component will have the same software,
-  but may have different configurations.
-
-  * <<Blueprints>> define the software and configuration for a
-  cluster, but not the specific nodes. Blueprints can derive from each
-  and only need to specify the part that differ from their
-  parent. Thus, although blueprints can specify the version for each
-  component, most will not.
-
-  * <<Stack>> are a tested combination of specific component versions
-  that are tested and distributed by a vendor. These stacks form the
-  basis for the blueprints and include the suggested versions of each
-  component and the default configuration.
-
-  * A <<cluster>> uses a blueprint and a set of nodes to form a
-  cluster. Clusters can define a specific mapping of roles to sets of
-  nodes or let Ambari assign the roles. Clusters can either be active,
+  are active and include servers, such as HDFS, and some are passive
+  libraries, such as Pig. The servers of active components provide a 
+  <<service>>.
+
+  * Components consist of <<roles>> that represent the different
+  configurations required by the component. Components have a client
+  role and a role for each server. HDFS roles, for example, are
+  'client,' 'namenode,' 'secondary namenode,' and 'datanode.' The
+  client role installs the client software and configuration, while
+  each server role installs the appropriate software and configuration.
+
+  * <<Stacks>> define the software and configuration for a
+  cluster. Stacks can inherit from each and only need to specify
+  the part that differ from their parent. Thus, although stacks
+  can specify the version for each component, most will not.
+
+  * A <<cluster>> uses a stack and a set of nodes to form a
+  cluster. When a cluster is defined, the user may specify the nodes
+  for each role or let Ambari automatically assign the roles based on
+  the nodes characteristics.  Clusters' state can either be active,
   inactive, or retired. Active clusters will be started, inactive
-  clusters have reserved nodes, but will not be started. Retired
+  clusters have reserved nodes, but and will be stopped. Retired
   clusters will keep their definition, but their nodes are released.
 
-Blueprints
+Configuration
+
+  Ambari abstracts cluster configuration into groups of string
+  key/value pairs. This abstraction lets us manage and manipulate the
+  configurations in a consistent and component agnostic way. The
+  groups are named for the file that they end up in, and the groups
+  are defined by the set of components. For Hadoop, the groups are:
+ 
+  * hadoop/hadoop-env
+
+  * hadoop/capacity-scheduler
+
+  * hadoop/core-site
+
+  * hadoop/hdfs-site
+
+  * hadoop/log4j.properties
+
+  * hadoop/mapred-queue-acl
+
+  * hadoop/mapred-site
+
+  * hadoop/metrics2.properties
+
+  * hadoop/task-controller
 
-  Blueprints form the basis of defining what software needs to be
-  installed and run and the configuration for that software. Rather than
-  have the administrator define the entire blueprint from scratch,
-  blueprints inherit most of their properties from their parent. This
+* Configuration example
+
+  Although users will typically define configurations via the web UI,
+  it is useful to examine a sample JSON expression that would define a
+  configuration in the REST api.
+
+------
+{
+  "hadoop/hadoop-env": {
+    "HADOOP_CONF_DIR": "/etc/hadoop",
+    "HADOOP_NAMENODE_OPTS": "-Dsecurity.audit.logger=INFO,DRFAS",
+    "HADOOP_CLIENT_OPTS": "-Xmx128m"
+  },
+  "hadoop/core-site": {
+     "fs.default.name" : "hdfs://${namenode}:8020/",
+     "hadoop.tmp.dir" : "/grid/0/hadoop/tmp",
+     "hadoop.security.authentication" : "kerberos",
+  }
+  "hadoop/hdfs-site": {
+     "hdfs.user": "hdfs"
+  }
+}
+------
+
+Stacks
+
+  Stacks form the basis of defining what software needs to be
+  installed and run and the configuration for that software. Rather
+  than have the administrator define the entire stack from scratch,
+  stacks inherit most of their properties from their parent. This
   allows the administrator to take a default stack and only modify the
   properties that need to be changed without dealing with a lot of
   boilerplate.
 
-  Blueprints include a list of repositories that contain the rpms or
+  Stacks include a list of repositories that contain the rpms or
   tarballs. The repositories will be searched in the given order and
   if the required component versions are not found, the next one will
-  be searched. If the required file isn't found, the parent's
-  blueprint will be searched and so on.
+  be searched. If the required file isn't found, the parent stack's
+  repository list will be searched and so on.
 
-  Blueprints define the version of each component that they need. Most
+  Stacks define the version of each component that they need. Most
   of the versions will come from the stack, but the operator can
   override the version as needed.
 
-  The blueprint define the configuration parameters that need to be
-  specified. The configuration is broken down by file (eg. hadoop-env
-  versus core-site) and then a list of key/value pairs that represent
-  each configuration item.  To keep the blueprints generic, the
-  configuration values may refer to the nodes that hold a particular
-  role. Thus, <<<fs.default.name>>> may be configured to
-  <<<hdfs://${namenode}/>>> and the name of the namenode will be
-  filled in during the configuration.
-
-  Finally (and unfortunately), a few configuration settings need to
-  set for particular roles. Examples of this include using different
-  JVM options for the NameNode and setting the https security option
-  for the NameNode.
-
-* Blueprint example
-
-  Although users will typically define blueprints and clusters over
-  the web UI, it is useful to examine a sample JSON expression that
-  would define a blueprint for the REST api.
+  The stack define the configuration parameters to be used by this
+   stack.  To keep the stacks generic, the configuration values may
+   refer to the nodes that hold a particular role. Thus,
+   <<<fs.default.name>>> may be configured to
+   <<<hdfs://${namenode}/>>> and the name of the namenode will be
+   filled in during the configuration.  A few configuration settings
+   need to set exclusively for particular roles. For example, the
+   NameNode needs to enable the https security option.
+
+* Stack example
+
+  Here's a example JSON expression for defining a stack.
 
 ------
 {
   "parent": "site",        /* declare parent as site, r42 */
   "parent-revision": "42",
-  "repositories": [        /* declare where to get components */
-    {
-      "location": "http://repos.hortonworks.com/yum",
-      "type": "yum"
-    },
-    {
-      "location": "http://incubator.apache.org/ambari/stack",
-      "type": "tar"
-    },
-  ],
+  "repositories": {
+    "yum": ["http://incubator.apache.org/ambari/stack/yum"],
+    "tar": ["http://incubator.apache.org/ambari/stack/tar"]
+  },
   "configuration": {    /* define the general configuration */
-    "hadoop-env": {
+    "hadoop/hadoop-env": {
       "HADOOP_CONF_DIR": "/etc/hadoop",
       "HADOOP_NAMENODE_OPTS": "-Dsecurity.audit.logger=INFO,DRFAS",
       "HADOOP_CLIENT_OPTS": "-Xmx128m"
     },
-    "core-site": {
+    "hadoop/core-site": {
        "fs.default.name" : "hdfs://${namenode}:8020/",
        "hadoop.tmp.dir" : "/grid/0/hadoop/tmp",
-       "!hadoop.security.authentication" : "kerberos",
+       "hadoop.security.authentication" : "kerberos",
+    }
+    "hadoop/hdfs-site": {
+       "hdfs.user": "hdfs"
     }
   }
   "components": {
@@ -177,30 +221,40 @@ Blueprints
       "arch": "i386"
     },
     "hdfs": {
-      "user": "hdfs" /* define the user to run the servers */
-    },
-    "mapreduce": {
-      "user": "mapred"
+      "roles": {
+        "namenode": { /* override one value on the namenode */
+          "hadoop/hdfs-site": {
+            "dfs.https.enable": "true"
+          }
+        }
+      }
     },
     "pig": {
       "version": "0.9.0"
     }
-  },
-  "roles": {  /* override one value on the namenode */
-    "namenode": {
-      "configuration": {
-        "hdfs-site": {
-           "dfs.https.enable": "true"
-        }
-      }
-    }
   }
 }
 ------
 
+Component Definitions
+
+  We are designing the Ambari infrastructure with a generic interface
+  for defining components. The current version of Ambari doesn't
+  publicize the interface, but the intention is to open it up to
+  support thirrd party components. Ambari will search the configured
+  repositories for the component definition and use that definition to
+  install, manage, run, and remove the component. To have consistency
+  in the architecture, the standard Hadoop services will also be
+  plugged in to Ambari using the same mechanism.
+
+  The component definitions are written as a text file that provides
+  the commands to perform each kind of action, such as install, start,
+  stop, or remove. There will be well defined environment that the
+  commands run in to provide consistency between platforms.
+
 Clusters
 
-  Defining a cluster, involves picking a blueprint and assigning nodes to the
+  Defining a cluster, involves picking a stack and assigning nodes to the
   cluster.
 
   Clusters have a goal state, which can be one of three values:
@@ -215,8 +269,8 @@ Clusters
 
   []
 
-  Clusters also have a list of active services that should be running. This 
-  overrides the blueprint and provides a mechanism for the administrator to
+  Clusters also have a list of active components that should be running. This 
+  overrides the stack and provides a mechanism for the administrator to
   shutdown a service temporarily.
 
 * Cluster example
@@ -224,7 +278,7 @@ Clusters
 ------
 {
   "description": "alpha cluster",
-  "blueprint": "kryptonite",
+  "stack": "kryptonite",
   "nodes": ["node000-999", "gateway0-1"],
   "goal": "active",
   "services": ["hdfs", "mapreduce"],
@@ -236,7 +290,42 @@ Clusters
   }
 }
 ------
-  
+
+Stack Deployment
+
+  Ambari will deploy the software for its clusters from either
+  OS-specific packages (rpms and debs) or tarballs. Rpms have the
+  advantage of putting the software in a user-convenient location,
+  such as <<</usr/bin>>>, but they are specific to an OS and don't
+  support having multiple versions installed at once, while tarballs
+  require rebuilding the entire deployment to change one component
+  version.
+
+  The layout on the nodes looks like:
+
+------
+${ambari}/clusters/${cluster}-${role}/stack/
+                                     /logs/
+                                     /data/disk-${0 to N}/
+                                     /pkgs/
+------
+
+  The software and configuration for the role are installed in
+  <<<stack>>>. The logs for the managed cluster are put into
+  <<<logs>>>. The cluster's data is in <<<data>>> with symlinks to
+  each of the disks that machine should use. Finally, the component
+  tarballs are placed in the <<<pkgs>>> directory to be installed by
+  the component.
+
+Ambari Installation
+
+  Ambari will be packaged as both OS-specific packages (rpms and debs)
+  and tarballs, which need to be installed on each node. The user
+  chooses one node as the Ambari controller, which is the point of
+  interaction for both the web UI and the REST interface. If the user
+  doesn't already have a Zookeeper service for Ambari to use, Ambari
+  will run one internally for its own use.
+
 Monitoring
 
   Monitoring the current state of the cluster is an important part of
@@ -248,19 +337,20 @@ Monitoring
 
 High-level Design
 
-  Ambari is managed by the Ambari Controller â a central server, which
+  Ambari is managed by the Ambari <<Controller>> â a central server, which
   provides the user interface and that directs the agent on each node.
   The agent is responsible for installing, configuring, running and
   cleaning up components of the Hadoop stack on the local node. Each agent will
   contact the controller when it has finished its work or N seconds have 
-  passed.
+  passed. The controller stores all of the information about the clusters and
+  stacks in Zookeeper, which is highly available and redundant.
 
   Ambari abstracts out the configuration and software stack in the
-  cluster as blueprint. Every stack release provides a default
-  blueprint. If a site has multiple clusters, they can define a "site"
-  blueprint that provides the site-wide defaults and have the cluster 
-  blueprints derive from it. Ambari will keep the revision history of
-  blueprints to enable operators to diagnose problems and track changes.
+  cluster as stack. Every stack release provides a default
+  stack. If a site has multiple clusters, they can define a "site"
+  stack that provides the site-wide defaults and have the cluster 
+  stacks derive from it. Ambari will keep the revision history of
+  stacks to enable operators to diagnose problems and track changes.
 
 Roadmap
 
@@ -288,12 +378,3 @@ Roadmap
   We plan to integrate an SNMP interface for integration with other cluster
   management tools.
 
-  We are designing the Ambari infrastructure to have an abstract
-  interface for managing components. The current version of Ambari
-  doesn't publicize the interface, but the intention is to open it up
-  to support 3rd party components. Once the plugin is made available,
-  Ambari can be used to deploy these services as any other standard
-  Hadoop service. To have consistency in the architecture, the
-  standard Hadoop services will also be plugged in to Ambari using the same
-  mechanism.
-