hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2119) JobTracker becomes non-responsive if the task trackers finish task too fast
Date Thu, 07 Feb 2008 20:31:08 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566782#action_12566782
] 

Devaraj Das commented on HADOOP-2119:
-------------------------------------

I think we should prioritize FAILED tasks over VIRGIN tasks when we miss the task-cache. That
way Owen's concern will be addressed. Regarding the options (5) and (6), one thing to note
is this that tasks should be removed from the Running tasks datastructure as soon as a task
comes to COMMIT_PENDING state. This will ensure that the the running tasks datastructure doesn't
grow indefinitely (since the JT would handle COMMIT_PENDING tasks in the background). 

Also, do we care whether speculative tasks are executed in the order of split sizes?

Overall, I think (1) + (3) + (5) looks like an approach worth trying out and benchmarking.
The other thing that might help is to not do delete from the datastructure in (5) until we
do a scan looking for speculative tasks (batch deletes). In general, the percentage of speculative
tasks is very small and so we might hit O(n) worst case for scan towards the end of the map/reduce
phases. But should be okay to have a slightly degraded performance when looking for speculative
tasks if the most frequent operations (looking for virgin/failed tasks) are efficient. Thoughts?

> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.17.0
>
>         Attachments: hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap space limit).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message