hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-802) Simplify the job updated event notification between Jobtracker and schedulers
Date Mon, 03 Aug 2009 10:47:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738274#action_12738274

Hemanth Yamijala commented on MAPREDUCE-802:

+1 in general to remove the old status. It is error-prone, and has not been needed other than
to lookup, which can be done in other ways.

bq. whenever JobInProgress changes its state, we route the associated event to JobTracker.
This will ensure that any part of code which changes the JobStatus would actually result in
events being raised.

We should add the code for raising the events at the common-most denominator in the code paths.
For e.g. all completed jobs pass through an API such as garbageCollect(). The run state change
event should be used there. One of the problems with the current approach is that this event
is raised at many places.

I think we should write the code in such a way that multiple events of the same type will
be a no-op. IOW, if the scheduler has already removed a job from it's queue, another call
to repeat the action should be a no-op.

bq. ...scheduler would have to maintain its association between job to job scheduling info
i.e. a Map<JobID,JobSchedulingInfo>...

An alternate option would be to iterate the jobs whenever there's a removal required. This
is in the order of jobs submitted, but would save on memory used. Even for a 1000 jobs this
might not be such a bad deal. Thoughts ?

> Simplify the job updated event notification between Jobtracker and schedulers
> -----------------------------------------------------------------------------
>                 Key: MAPREDUCE-802
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-802
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>            Reporter: Hemanth Yamijala
>            Assignee: Sreekanth Ramakrishnan
> HADOOP-4053 and HADOOP-4149 added events to take care of updates to the state / property
of a job like the run state / priority of a job notified to the scheduler. We've seen some
issues with this framework, such as the following:
> - Events are not raised correctly at all places. If a new code path is added to kill
a job, raising events is missed out.
> - Events are raised with incorrect event data. For e.g. typically start time value is
missed out.
> The resulting contract break between jobtracker and schedulers has lead to problems in
the capacity scheduler where jobs remain stuck in the queue without being ever removed and
so on.
> It has proven complicated to get this right in the framework and fixes have typically
still left dangling cases. Or new code paths introduce new bugs.
> This JIRA is about trying to simplify the interaction model so that it is more robust
and works well.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message