hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Ratan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3412) Refactor the scheduler out of the JobTracker
Date Tue, 15 Jul 2008 09:11:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613562#action_12613562

Vivek Ratan commented on HADOOP-3412:

bq. we want to support lazy loading of jobs so that we can scale up the job tracker, which
means we shouldn't use JobInProgress in the API since it is very heavy

Isn't there a far easier way to do this? A JobInProgress object grows in size when you call
JobInProgress.initTasks(). The current JT code calls initTasks() when a job is submitted to
the JT (the newly created JobInProgress object is actually put in a queue and some other thread
calls initTasks(), but it's really called as early as possible). A simple fix to this is to
call initTasks() only when a job is considered for running by the scheduler. That way, you
initialize a JobInProgress object only when needed. Otherwise, its memory footprint is low.
It's even worth arguing that a newly created JobInProgress object should look a lot like what
JobDescription looks like. It's only when you 'initialize' it, at the point when the (first
task in the) job can be considered for running, do you need to expand all the other data structures.
This seems, IMO, to be a better way to handle scale than have another class. To be fair, JobDescription
is really what the scheduler should be looking at, but it makes more sense if the Scheduler
is a separate component/process. Otherwise, you're duplicating state in JobDescription and
JobInProgress. You could also refactor JobInProgress so that it has a JobDescription member
variable which it exposes, rather than expose separate methods for getting/setting priority
or queue names, but there doesn't seem to be an advantage to it, other that conceptually encapsulating
information that a Scheduler might need in one class. 

As to whether queue names need to be part of TaskScheduler: we have two options here. 
* Queues are explicit in the system, and jobs are always submitted to a queue. If so, you
want this notion everywhere. JobTracker.submitJob() should be changed to take in a jobID and
a queue name, as you're explicitly submitting a job to a queue. Then, TaskScheduler requires
both a job and a queue name, in order to tell it that a job was submitted to the system (as
per Matei and Tom's comment earlier, addJob() is a listener method and just needs to know
when a job is submitted to the system). 
* Or, you could treat queues (and other things we may add later, such as Orgs) as part of
the job configuration. So, a user submits a job, and everything the system needs to know is
encapsulated in the jobID, when JobTracker.submitJob is called. 

bq. Then the only data structure holding jobs is in the scheduler and doing queries can be
done through this api.
Do we want the Scheduler to serve queries? In the future, you may well want to think of the
Scheduler as just an algorithm that, given the state of the system, only decides what task
to give to a TT. Web serving may be done through a completely different component. 

I really think some other component besides the Scheduler needs to be responsible for storing
jobs and maintaining data structures that associate the job with queues and deal with job
persistence - everything to do with keeping track of jobs & queues in memory. Different
schedulers impose different filters/sorting on these structures - they're really just algorithms
that access these data structures. Schedulers may keep other data structures for their use.
For example, in HADOOP-3445, the scheduler needs to know how many unique users have submitted
jobs to a queue, or how many tasks for a given user are running. This information is kept
in a different data structure that the scheduling code controls. It doesn't need to be persisted
and doesn't need the same scaling/persistence functionality as you need for JobInProgress
objects. So in that sense, the TaskScheduler interface should not also expose jobs and queues.
getQueueNames() and getJobs() belong elsewhere (probably in a JobQueueManager class). 

You may actually want two separate interfaces - one for Scheduling (which will be similar
to what TaskScheduler exposes) and one for iterating through jobs and queues. For performance
sake, you may have the same class implement both, but they are two separate interfaces. 

> Refactor the scheduler out of the JobTracker
> --------------------------------------------
>                 Key: HADOOP-3412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3412
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Brice Arnould
>            Assignee: Brice Arnould
>            Priority: Minor
>             Fix For: 0.19.0
>         Attachments: JobScheduler-v10.patch, JobScheduler-v9.1.patch, JobScheduler-v9.2.patch,
JobScheduler-v9.patch, JobScheduler.patch, JobScheduler_v2.patch, JobScheduler_v3.patch, JobScheduler_v3b.patch,
JobScheduler_v4.patch, JobScheduler_v5.patch, JobScheduler_v6.1.patch, JobScheduler_v6.2.patch,
JobScheduler_v6.3.patch, JobScheduler_v6.4.patch, JobScheduler_v6.patch, JobScheduler_v7.1.patch,
JobScheduler_v7.patch, JobScheduler_v8.patch, RackAwareJobScheduler.java, SimpleResourceAwareJobScheduler.java
> First I would like warn you that my proposition is assumed to be very naive. I just hope
that reading it won't make you lose time.
> h4. The aim
> It seems to me that improving Hadoop scheduling could be very profitable. But, it is
hard to implement and compare schedulers, because the scheduling logic is mixed within the
rest of the JobTracker.
> This bug is the first step of an attempt to improve the Hadoop scheduler. It re-implements
the current scheduling algorithm in a separate class called JobScheduler. This new class is
instantiated in the JobTracker.
> h4. Bug fixed as a side effects
> This patch probably cannot be submited as it is.
> A first difficulty is that it does not have exactly the same behaviour than the current
JobTracker. More precisely, it doesn't re-implement things like code that seems to be never
called or concurency problems.
> I wrote TOCONFIRM where my proposition differ from the current implementation, so you
can find them easily.
> I know that fixing bugs silently is bad. So, independently of what you decide about this
patch, I will open issues for bugs that you confirm.
> h4. Other side effects
> Another side effect of this patch is to add documentation about each step of the scheduling.
I hope that it will help future improvement by lowering the level required to contribute to
the scheduler.
> It also reduces the complexity and the granularity of the JobTracker (making it more
> h4. The future
> If you feel that this is a step the right direction, I will try to propose a JobSchedulerInterface
that many JobSchedulers could implement and to propose alternatives to the current « FifoJobScheduler
».  If some of you have ideas about that please tell ^^ I will also open issues for things
marked as FIXME in the patch.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message