hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2119) JobTracker becomes non-responsive if the task trackers finish task too fast
Date Wed, 13 Feb 2008 05:46:08 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568437#action_12568437
] 

Devaraj Das commented on HADOOP-2119:
-------------------------------------

BTW, if we take the sparse matrix approach, we really don't need the other datastructure for
RUNNING tasks. 
In the sparse matrix proposal, note that all TIPs are running in a location if the first TIP
is running since we always move TIP columns to the back whenever we choose a TIP for running.
And, we don't consider tasks to execute speculatively unless we run out of virgin tasks. So
when we run into the situation where we want to consider tasks for speculatve execution, we
go in the order - local, rack local, off rack. We hit all all the locations in O(1) and the
time to find a speculative task in a particular row is given by the placement of the first
slow task in the row. We also move this corresponding TIP column to the back in exactly the
same way we do for virgin tasks. This way we do speculative execution also in the order of
split sizes. 

> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.17.0
>
>         Attachments: hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap space limit).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message