hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1301) resource management proviosioning for Hadoop
Date Mon, 17 Dec 2007 13:29:43 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Arun C Murthy updated HADOOP-1301:

    Fix Version/s: 0.16.0
         Priority: Major  (was: Minor)

The following files serve as the documentation for this patch:

README: Gives an overview of HOD
getting_started.txt: Gives instructions on how to try out HOD
config.txt: A more detailed description of how to configure HOD.

Hemanth, could you please make them Apache Forrest based documentation? We could then put
these up on the website etc. 

> resource management proviosioning for Hadoop
> --------------------------------------------
>                 Key: HADOOP-1301
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1301
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Pete Wyckoff
>            Assignee: Hemanth Yamijala
>             Fix For: 0.16.0
>         Attachments: hod-hadoop.patch, hod-hadoop.v2.patch, hod-open-4.tar.gz, hod.0.2.2.tar.gz
> The Hadoop On Demand (HOD) project addresses the provisioning and managing of MapReduce
instances on cluster resources. With HOD, the MapReduce user interacts with the cluster solely
through a self-service interface and the JT, TT info ports. The user never needs to log into
the cluster or even have an account on the cluster for that matter. HOD allocates nodes, provisions
MapReduce (and optionally HDFS) on the cluster and when the user is done with MapReduce jobs,
cleanly shuts down MapReduce and de-allocates the nodes (i.e., re-introducing them to the
pool of available resources in the cluster).
> Using HOD, a cluster can be shared among different users in a fair and efficient manner.
HOD is not a replacement or re-implementation of a traditional resource manager. HOD is implemented
using the resource manager paradigm and at present is envisioned supporting Torque and Condor
out of the box. It also supports "static" resources, i.e., a dedicated set of resources not
using a resource manager.
> HOD is also self provisioning and, thus, can be used on systems such as EC2 or a campus
cluster not already running MapReduce software or a resouce manager. Figure 1 depicts a cluster
using HOD. As the figure shows, the user never logs into the cluster itself. The user's jobs
run as the 'hod' user (a configurable unix id).
> The user interacts with MapReduce and the cluster using the hod shell, hodsh. Once in
the hodsh, the user can allocate/de-allocate nodes and automatically run JT, TTs, NN, DNs
on those nodes without knowing the specifics of which nodes are running which or logging into
any of those boxes. HOD transparently masks failures by allocating nodes to replace failed
nodes. Once the user has allocated nodes, she can run /bin/MapReduce my1.jar and then /bin/MapReduce
my2.jar ... from within the hod shell which automatically generates the configuration file
for the MapReduce script. When done, the user will exit the shell.
> The hod shell has an automatic timeout so that users cannot hog resources they aren't
using. The timeout applies only when there is no MapReduce job running. In addition, hod also
has the option of tracking and enforcing user/group resource limits.
> Optionally, HOD can run dedicated log and directory services in the cluster. The log
services are a central repository for collecting and retrieving Hadoop logs for any given
job. The directory service provides an easy way to inspect what's running in the cluster or
for the end user and html interfacing for getting to their JT and TT info ports. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message