hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mahadev konar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-719) Integration of Hadoop with batch schedulers
Date Tue, 14 Nov 2006 22:17:41 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-719?page=comments#action_12449827 ] 
Mahadev konar commented on HADOOP-719:

-- another key requirement for HOD is distribution of hadoop jars onto the cluster. Most of
the batch schedulers are not effective in transferring of files. HOD would require a service
for transferring of hadoop jars. 

> Integration of Hadoop with batch schedulers
> -------------------------------------------
>                 Key: HADOOP-719
>                 URL: http://issues.apache.org/jira/browse/HADOOP-719
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/streaming
>            Reporter: Mahadev konar
>         Assigned To: Mahadev konar
> Hadoop On Demand (HOD) is an integration of Hadoop with batch schedulers like Condor/torque/sun
grid etc. Hadoop On Demand or HOD hereafter is a system that populates a Hadoop instance using
a shared batch scheduler. HOD will find a requested number of nodes and start up Hadoop daemons
on them. Users map reduce jobs can then run on the hadoop instance. After the job is done,
HOD gives back the  nodes to the shared batch scheduler. A group of users will use HOD to
acquire Hadoop instances of varying sizes and the batch scheduler will schedule requests in
a way that important jobs gain more importance/resources and finish fast. Here are a list
of requirements for HOD and batch schedulers:
> Key Requirements :
> --- Should allocate the specified minimum number of nodes for a job 
>    Many batch jobs can finish in time, only when enough resources are allocated. Therefore
batch scheduler should allocate the asked number of nodes for a given job when the job starts.
This is simple form of what's known as gang    scheduling.
>   Often the minimum nodes are not available right away, especially if the job asked for
a large number. The batch scheduler should support advance reservation for important jobs
so that the wait time can be determined. In advance   reservation, a reservation is created
on earliest future point when the preoccupied nodes become available. When nodes are currently
idle but booked by future reservations, batch scheduler is ok to give them to other jobs to
increase system utilization, but only when doing so does not delay existing reservations.
> --- run short urgent job without costing too much loss to long job. Especially, should
not kill job tracker of long job. 
>   Some jobs, mostly short ones, are time sensitive and need urgent treatment. Often,
large portion of cluster nodes will be occupied by long running jobs. Batch scheduler should
be able to preempt long jobs and run urgent jobs. Then, urgent jobs will finish quickly and
long jobs can re-gain the nodes afterward. 
> When preemption happens, HOD should minimize the loss to long jobs. Especially, it should
not kill job tracker of long job.
> --- be able to dial up, at run time, share of resources for more important projects.
>   Viewed at high level, a given cluster is shared by multiple projects. A project consists
of a number of jobs submitted by a group of users.Batch scheduler should allow important projects
to have more resources. This should be tunable at run time as what projects deem more important
may change over time. 
> --- prevent malicious abuse of the system. 
>   A shared cluster environment can be put in jeopardy if malicious or erroneous job code
>  -- hold unneeded resources for a long period 
>  -- use privileges for unworthy work 
>   Such abuse can easily cause under-utilization or starvation of other jobs. Batch scheduler
should allow  setting up policies for preventing resource abuse by: 
>  -- limit privileges to legitimate uses asking for proper amount 
>  -- throttle peak use of resources per player 
>  -- monitor and reduce starvation 
> --- The behavior should be simple and predictable 
>    When status of the system is queried, we should be able to determine what factors
caused it to reach current status and what could be the future behavior with or without our
tuning on the system. 
> --- be portable to major resource managers 
>    HOD design should be portable so that in future we are able to plugin other resource
> Some of the key requirements are implemented by the batch schedulers. The others need
to be implemented by HOD.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message