hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sreekanth Ramakrishnan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-802) Simplify the job updated event notification between Jobtracker and schedulers
Date Mon, 03 Aug 2009 08:31:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738226#action_12738226

Sreekanth Ramakrishnan commented on MAPREDUCE-802:

Currently problems arise within the systems which rely on the job events can be classified
into two categories:

# Not all code path make call to raise status change events. The reason for this is the state
change is performed in {{JobInProgress}} which does not have handle to the list of {{JobInProgressListener}}
which is managed by the {{JobTracker}}. So the components which need the state change for
removing/updating internal structures of {{JobInProgress}} object are left out of synch.
# Relying, on {{oldStatus}} field and member of the structure to be correctly set by {{JobTracker}}
before calling the listeners. Notable example of this is start time changes which is described

In order to solve the problems listed above following is a proposal:

* For solving the case number 1, whenever {{JobInProgress}} changes its state, we route the
associated event to {{JobTracker}}. This will ensure that any part of code which changes the
{{JobStatus}} would actually result in events being raised.
* For solving the case number 2, we remove the the {{oldStatus}} field in {{JobStatusChangeEvent}}
as it is not always correct. The change would be an incompatible change and old status is
actually used in two schedulers {{JobQueueJobInProgressListener}} for default scheduler and
{{JobQueueManager}} for capacity scheduler. So both these scheduler would now have to maintain
their link of old status to {{JobInProgress}}.

The changes proposed would change current pseudo code for raising events as below:
 JobStatus oldStatus = job.getstatus.clone
 make changes to jobs status.
 JobStatus newStatus = job.getstatus.clone
 create event with both old and new
 inform listeners

To following:
  make changes to job
  create JobChanged event
  inform listeners

So scheduler would have maintain an association with the scheduling information which they
used to populate their internal structures previously on their own instead of the {{JobTracker}}
sending correct information.

Currently, default scheduler {{JobQueueTaskScheduler}} maintains the ordered list of jobs
using a {{TreeMap<JobSchedulingInfo,JobInProgress>}}, the key of the map while update
operation was constructed using _oldStatus_ field of the {{JobStatusChangedEvent}}. With proposed
changed as _oldStatus_ is removed default scheduler would have to maintain its association
between job to job scheduling info i.e. a {{Map<JobID,JobSchedulingInfo>}} the value
of a JobID would be current {{JobSchedulingInfo}} which it used to insert into {{TreeMap}}
of the scheduler. While {{jobUpdated()}} is called removal of the old {{JobSchedulingInfo}}
from {{TreeMap}} would be done using the value from {{Map}}, then  {{Map<JobID,JobSchedulingInfo>}}
and {{TreeMap<JobSchedulingInfo,JobInProgress>}} are updated with most recent {{JobSchedulingInfo}}.

Any comments on the above proposal and changes which it would bring to framework?

> Simplify the job updated event notification between Jobtracker and schedulers
> -----------------------------------------------------------------------------
>                 Key: MAPREDUCE-802
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-802
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>            Reporter: Hemanth Yamijala
>            Assignee: Sreekanth Ramakrishnan
> HADOOP-4053 and HADOOP-4149 added events to take care of updates to the state / property
of a job like the run state / priority of a job notified to the scheduler. We've seen some
issues with this framework, such as the following:
> - Events are not raised correctly at all places. If a new code path is added to kill
a job, raising events is missed out.
> - Events are raised with incorrect event data. For e.g. typically start time value is
missed out.
> The resulting contract break between jobtracker and schedulers has lead to problems in
the capacity scheduler where jobs remain stuck in the queue without being ever removed and
so on.
> It has proven complicated to get this right in the framework and fixes have typically
still left dangling cases. Or new code paths introduce new bugs.
> This JIRA is about trying to simplify the interaction model so that it is more robust
and works well.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message