hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5548) Observed negative running maps on the job tracker
Date Mon, 23 Mar 2009 11:47:52 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688271#action_12688271
] 

Amareshwari Sriramadasu commented on HADOOP-5548:
-------------------------------------------------

As Devaraj pointed out, problem is not with JobTracker restart.
In JobTracker, TaskTrackerStatus is cached in {{taskTrackers}} and is supposed to be read-only.
But it is passed to updateTaskStatuses() method, in which task reports (TaskStatus objects)
are passed to JobInProgress. In JobInProgress.updaTaskStatuses() and tip.updateStatus(), the
TaskStatus object is getting modified.
The code in TaskInProgress modifying the TaskStatus reference :
{code}
    if (!isCleanupAttempt(taskid)) {
      taskStatuses.put(taskid, status);
    } else {
      taskStatuses.get(taskid).statusUpdate(status.getRunState(),
        status.getProgress(), status.getStateString(), status.getPhase(),
        status.getFinishTime());
    }
{code}

This could make total count negative in following scenario:
Tracker1 reported a task *t_0* is KILLED_UNCLEAN. 
Tracker2 is given the cleanup attempt for t_0.
Tracker2 reports saying it is running cleanup attempt t_0. Updates taskStatuses object,  which
is holding TaskStatus object from tracker1's status.
JT calculates total count assuming the task is run by both the trackers, thus leading to negative
totals.

Cloning TaskStatus object and passing to JIP looks like the correct solution. Thoughts?

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message