hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Ratan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2119) JobTracker becomes non-responsive if the task trackers finish task too fast
Date Wed, 06 Feb 2008 12:59:08 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566120#action_12566120

Vivek Ratan commented on HADOOP-2119:

Regarding finding the next task to run: 

The problem is that we scan the array of TaskInProgress(TIP) objects (the 'tasks' array) from
the beginning, each time we need to find a task to run and we haven't found one in the cache.
When you have lots and lots of tasks, after some time the tasks array will predominantly be
filled with running or completed tasks, especially at lower indexes. So it takes longer and
longer to scan the array for virgin tasks (tasks that haven't run yet)  or failed tasks if
we always start from the beginning. There are a number of ways to get around this: 

1. On a cache miss, if you don't mind returning a virgin task ahead of a failed task, even
though the failed task occurs earlier in the array than the virgin task, then you can do the
following. Keep an index (a pointer) into the tasks array, which points to the first task
in the array that is a virgin task. Every TIP to the left of this index will have been run
at least once. Every time we need to scan the array, we look for runnable tasks from this
index onwards. This index will move across the length of the array just once during the running
of all tasks, so it averages out to O(1) for finding the first virgin task. This is fairly
easy to code, and should provide significant savings in the cases where there are lots and
lots of tasks. If a virgin task is not found, you can use the same code to look for a failed
task or a speculative task. 

2. If you want to keep the same algorithm for finding the next task, i.e., if you want to
return either a virgin or failed task first on a cache miss, then you can keep an index into
the array which represents the first failed or virgin task. This approach won't be as fast
as the first one, since this index can move back and forth across the array, but it should
still be better over what we do today. In both options, the index will have to be potentially
updated whenever a task's status changes. 

A more detailed approach requires maintaining separate lists for tasks with different states:
one for virgin tasks, one for failed, one for completed, and one for running. As a task's
status changes, it moves from one list to another. These lists can be sorted (the list of
virgin tasks should be sorted in decreasing order of input size; the other lists can be sorted
the same way, or perhaps in order of when the tasks ran, which simplifies things). Picking
the next task is as simple as walking down the list (in many cases, just picking the first
element of a list). Care must be taken that moving an element from one list to another is
as effective as possible. The array of TIPS is no longer needed. I can think of at least a
couple of ways of doing this: 

3. Modify a TIP object to have a reference to a 'previous' TIP object and a 'next' TIP object,
so that we have linked lists of TIPS. Then a list is just a reference to the first (and last)
TIP object, and a TIP belongs to just one list. If objects are added into the Running, Completed,
and Failed lists at the tail only, handling a TIP's status change is O(1), but the lists are
no longer sorted by size. Rather they are sorted by when the task ran. Finding the next task
is pretty much constant, as we either pick the first TIP in the Virgin list, or the first
TIP in the failed list (such that the TIP didn't fail on the host), or we walk through the
Running list to find the first speculative task. If you need the lists sorted by size, then
insertion is O(n). This is a fair bit of change, as far as coding is concerned. 

4. Lists can also be implemented as arrays of TIPS. This takes more space but moving a TIP
from one list to another is faster than in the linked-list case. Insertion into a list sorted
by size can be O(log n). 

You can also have hybrid approaches where you just keep lists for virgin tasks, for example.

I think it makes most sense to go with Option 1 for now, as it's the easiest to implement
and makes the most common case run much faster. Options 3 and 4 need a fair bit of refactoring
and may be an overkill for now, since you can get the most bang for the buck by just making
sure that you don't scan the array from the beginning for virgin tasks.  

> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.17.0
>         Attachments: hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap space limit).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message