hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mahadev konar (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-719) Integration of Hadoop with batch schedulers
Date Tue, 14 Nov 2006 21:01:37 GMT
Integration of Hadoop with batch schedulers

                 Key: HADOOP-719
                 URL: http://issues.apache.org/jira/browse/HADOOP-719
             Project: Hadoop
          Issue Type: New Feature
          Components: contrib/streaming
            Reporter: Mahadev konar
         Assigned To: Mahadev konar

Hadoop On Demand (HOD) is an integration of Hadoop with batch schedulers like Condor/torque/sun
grid etc. Hadoop On Demand or HOD hereafter is a system that populates a Hadoop instance using
a shared batch scheduler. HOD will find a requested number of nodes and start up Hadoop daemons
on them. Users map reduce jobs can then run on the hadoop instance. After the job is done,
HOD gives back the  nodes to the shared batch scheduler. A group of users will use HOD to
acquire Hadoop instances of varying sizes and the batch scheduler will schedule requests in
a way that important jobs gain more importance/resources and finish fast. Here are a list
of requirements for HOD and batch schedulers:

Key Requirements :

--- Should allocate the specified minimum number of nodes for a job 
   Many batch jobs can finish in time, only when enough resources are allocated. Therefore
batch scheduler should allocate the asked number of nodes for a given job when the job starts.
This is simple form of what's known as gang    scheduling.

  Often the minimum nodes are not available right away, especially if the job asked for a
large number. The batch scheduler should support advance reservation for important jobs so
that the wait time can be determined. In advance   reservation, a reservation is created on
earliest future point when the preoccupied nodes become available. When nodes are currently
idle but booked by future reservations, batch scheduler is ok to give them to other jobs to
increase system utilization, but only when doing so does not delay existing reservations.

--- run short urgent job without costing too much loss to long job. Especially, should not
kill job tracker of long job. 
  Some jobs, mostly short ones, are time sensitive and need urgent treatment. Often, large
portion of cluster nodes will be occupied by long running jobs. Batch scheduler should be
able to preempt long jobs and run urgent jobs. Then, urgent jobs will finish quickly and long
jobs can re-gain the nodes afterward. 

When preemption happens, HOD should minimize the loss to long jobs. Especially, it should
not kill job tracker of long job.

--- be able to dial up, at run time, share of resources for more important projects.
  Viewed at high level, a given cluster is shared by multiple projects. A project consists
of a number of jobs submitted by a group of users.Batch scheduler should allow important projects
to have more resources. This should be tunable at run time as what projects deem more important
may change over time. 

--- prevent malicious abuse of the system. 
  A shared cluster environment can be put in jeopardy if malicious or erroneous job code does:

 -- hold unneeded resources for a long period 
 -- use privileges for unworthy work 
  Such abuse can easily cause under-utilization or starvation of other jobs. Batch scheduler
should allow  setting up policies for preventing resource abuse by: 
 -- limit privileges to legitimate uses asking for proper amount 
 -- throttle peak use of resources per player 
 -- monitor and reduce starvation 

--- The behavior should be simple and predictable 
   When status of the system is queried, we should be able to determine what factors caused
it to reach current status and what could be the future behavior with or without our tuning
on the system. 

--- be portable to major resource managers 
   HOD design should be portable so that in future we are able to plugin other resource manager.

Some of the key requirements are implemented by the batch schedulers. The others need to be
implemented by HOD.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message