hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2119) JobTracker becomes non-responsive if the task trackers finish task too fast
Date Wed, 19 Mar 2008 18:50:28 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580500#action_12580500
] 

Devaraj Das commented on HADOOP-2119:
-------------------------------------

Some comments. 
1) Remove default-node --> use a separate list for non-local running/non-running maps.
So instead of falling to the array on a cache miss you hit the list that you can update as
well (remove items, and add them to a equivalent list for running, etc.).
2) Maintain a mapping from the level to the set of nodes in that level (except level 0). You
should look at the TIPs at the topmost level cache (in case max cache level is 2, then that
will mean all racks), when you look for something to run on a cache miss. 
3) Change the JobInProgress code to reflect proper terminologies like caches/lists etc
4) TIPs that don't have locations get added to a special list instead of the default-node
cache (point 1)
5) Change the signature of findNewCachedTask to take the level instead of a boolean. Also,
i think it'd be better if you call the method findTaskFromList since it caters to both maps
and reduces and reduces really don't have a cache.
6) getCurrentTime should be moved out to a place where it is called exactly once per findTask
7) I don't think it is that important to move tasks to the back of the list in case of speculative
tasks.


> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-2119-v4.1.patch, hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap space limit).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message