hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sreekanth Ramakrishnan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-802) Simplify the job updated event notification between Jobtracker and schedulers
Date Mon, 03 Aug 2009 08:31:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738226#action_12738226
] 

Sreekanth Ramakrishnan commented on MAPREDUCE-802:
--------------------------------------------------

Currently problems arise within the systems which rely on the job events can be classified
into two categories:

# Not all code path make call to raise status change events. The reason for this is the state
change is performed in {{JobInProgress}} which does not have handle to the list of {{JobInProgressListener}}
which is managed by the {{JobTracker}}. So the components which need the state change for
removing/updating internal structures of {{JobInProgress}} object are left out of synch.
# Relying, on {{oldStatus}} field and member of the structure to be correctly set by {{JobTracker}}
before calling the listeners. Notable example of this is start time changes which is described
in MAPREDUCE-45

In order to solve the problems listed above following is a proposal:

* For solving the case number 1, whenever {{JobInProgress}} changes its state, we route the
associated event to {{JobTracker}}. This will ensure that any part of code which changes the
{{JobStatus}} would actually result in events being raised.
* For solving the case number 2, we remove the the {{oldStatus}} field in {{JobStatusChangeEvent}}
as it is not always correct. The change would be an incompatible change and old status is
actually used in two schedulers {{JobQueueJobInProgressListener}} for default scheduler and
{{JobQueueManager}} for capacity scheduler. So both these scheduler would now have to maintain
their link of old status to {{JobInProgress}}.

The changes proposed would change current pseudo code for raising events as below:
{noformat}
 JobStatus oldStatus = job.getstatus.clone
 make changes to jobs status.
 JobStatus newStatus = job.getstatus.clone
 create event with both old and new
 inform listeners
{noformat}

To following:
{noformat}
  make changes to job
  create JobChanged event
  inform listeners
{noformat}

So scheduler would have maintain an association with the scheduling information which they
used to populate their internal structures previously on their own instead of the {{JobTracker}}
sending correct information.

Currently, default scheduler {{JobQueueTaskScheduler}} maintains the ordered list of jobs
using a {{TreeMap<JobSchedulingInfo,JobInProgress>}}, the key of the map while update
operation was constructed using _oldStatus_ field of the {{JobStatusChangedEvent}}. With proposed
changed as _oldStatus_ is removed default scheduler would have to maintain its association
between job to job scheduling info i.e. a {{Map<JobID,JobSchedulingInfo>}} the value
of a JobID would be current {{JobSchedulingInfo}} which it used to insert into {{TreeMap}}
of the scheduler. While {{jobUpdated()}} is called removal of the old {{JobSchedulingInfo}}
from {{TreeMap}} would be done using the value from {{Map}}, then  {{Map<JobID,JobSchedulingInfo>}}
and {{TreeMap<JobSchedulingInfo,JobInProgress>}} are updated with most recent {{JobSchedulingInfo}}.

Any comments on the above proposal and changes which it would bring to framework?

> Simplify the job updated event notification between Jobtracker and schedulers
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-802
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-802
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>            Reporter: Hemanth Yamijala
>            Assignee: Sreekanth Ramakrishnan
>
> HADOOP-4053 and HADOOP-4149 added events to take care of updates to the state / property
of a job like the run state / priority of a job notified to the scheduler. We've seen some
issues with this framework, such as the following:
> - Events are not raised correctly at all places. If a new code path is added to kill
a job, raising events is missed out.
> - Events are raised with incorrect event data. For e.g. typically start time value is
missed out.
> The resulting contract break between jobtracker and schedulers has lead to problems in
the capacity scheduler where jobs remain stuck in the queue without being ever removed and
so on.
> It has proven complicated to get this right in the framework and fixes have typically
still left dangling cases. Or new code paths introduce new bugs.
> This JIRA is about trying to simplify the interaction model so that it is more robust
and works well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message