hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3412) Refactor the scheduler out of the JobTracker
Date Wed, 16 Jul 2008 13:23:31 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tom White updated HADOOP-3412:
------------------------------

    Attachment: JobScheduler-v11.patch

For this issue, which is about moving scheduling logic from the JobTracker to a scheduler
class, I think we can leave out queues. We don't currently have the explicit concept of a
queue, so I think it makes sense to commit this change, and continue the discussion about
adding queues in HADOOP-3445. As discussed earlier, this Jira will not change the public APIs
yet, so we can go on evolving the scheduling interface.

bq. Fair point about the JobInProgress being fine for the API, provided that the scheduler
is required to call initTasks on the JobInProgress when it should be loaded. 

The implication of this is that the Scheduler takes over the responsibility of managing the
jobInitQueue. I've created a patch which does this (v11) by inserting a EagerTaskInitializationTaskScheduler
into the TaskScheduler hierarchy. In doing so I needed a couple of lifecycle methods, which
I've named following HADOOP-3628, so TaskScheduler can be retrofitted to extend Service after
HADOOP-3628 is committed. 

Does this look OK?

bq. an event when a TIP changes state, so that the scheduler can update its data structures

Would the taskUpdated method be called by JobTracker#updateTaskStatuses? I can see that it
might be useful for schedulers to have this information, but perhaps this is something to
add to the interface  when a use case comes up? (TaskScheduler is an abstract class, so it's
easy to add new methods to it.)

> Refactor the scheduler out of the JobTracker
> --------------------------------------------
>
>                 Key: HADOOP-3412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3412
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Brice Arnould
>            Assignee: Brice Arnould
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: JobScheduler-v10.patch, JobScheduler-v11.patch, JobScheduler-v9.1.patch,
JobScheduler-v9.2.patch, JobScheduler-v9.patch, JobScheduler.patch, JobScheduler_v2.patch,
JobScheduler_v3.patch, JobScheduler_v3b.patch, JobScheduler_v4.patch, JobScheduler_v5.patch,
JobScheduler_v6.1.patch, JobScheduler_v6.2.patch, JobScheduler_v6.3.patch, JobScheduler_v6.4.patch,
JobScheduler_v6.patch, JobScheduler_v7.1.patch, JobScheduler_v7.patch, JobScheduler_v8.patch,
RackAwareJobScheduler.java, SimpleResourceAwareJobScheduler.java
>
>
> First I would like warn you that my proposition is assumed to be very naive. I just hope
that reading it won't make you lose time.
> h4. The aim
> It seems to me that improving Hadoop scheduling could be very profitable. But, it is
hard to implement and compare schedulers, because the scheduling logic is mixed within the
rest of the JobTracker.
> This bug is the first step of an attempt to improve the Hadoop scheduler. It re-implements
the current scheduling algorithm in a separate class called JobScheduler. This new class is
instantiated in the JobTracker.
> h4. Bug fixed as a side effects
> This patch probably cannot be submited as it is.
> A first difficulty is that it does not have exactly the same behaviour than the current
JobTracker. More precisely, it doesn't re-implement things like code that seems to be never
called or concurency problems.
> I wrote TOCONFIRM where my proposition differ from the current implementation, so you
can find them easily.
> I know that fixing bugs silently is bad. So, independently of what you decide about this
patch, I will open issues for bugs that you confirm.
> h4. Other side effects
> Another side effect of this patch is to add documentation about each step of the scheduling.
I hope that it will help future improvement by lowering the level required to contribute to
the scheduler.
> It also reduces the complexity and the granularity of the JobTracker (making it more
parallel).
> h4. The future
> If you feel that this is a step the right direction, I will try to propose a JobSchedulerInterface
that many JobSchedulers could implement and to propose alternatives to the current « FifoJobScheduler
».  If some of you have ideas about that please tell ^^ I will also open issues for things
marked as FIXME in the patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message