hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Ratan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2119) JobTracker becomes non-responsive if the task trackers finish task too fast
Date Wed, 13 Feb 2008 01:05:11 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568383#action_12568383
] 

Vivek Ratan commented on HADOOP-2119:
-------------------------------------

We need to clarify what assumptions are made by the various proposals, how they're different
and similar, and figure out the best approach. After discussions between Owen, Arun and myself,
we feel the following assumptions are valid:

- Failed tasks (tasks and TIPs are used interchangeably in this discussion) should be considered
before virgin tasks where possible. 
- Failed tasks need not be sorted on input size. In fact, there isn't any one obvious way
in which they should be sorted. 
- Similarly,  there isn't any one obvious way in which running tasks should be sorted. Maybe
the tasks that have run the longest are better candidates for speculation, but that's not
obvious. 

After looking at the various data structures that have been suggested, the following are recommended:


1. A single data structure for virgin and failed tasks is recommended (call this the runnable
list). There are two options:

1a. We use a structure similar to that for the cache. Have a hashmap of hostnames to linked
lists, where each list contains the runnable tasks for the host. 
- Each list is sorted in decreasing order of input size at the beginning, as all tasks are
virgin. 
- A task can occur in a list for more than one host. 
- When a task is selected to run (this is the first task in the list, unless that task is
marked as running on another host, in which case it is removed from the list and the next
task is considered), it is removed from the runnable list. This is an O(1) operation. 
- A task can be inserted into a runnable list when it fails, in which case it is inserted
into the beginning of the list (O(1)). 
- A runnable list can end up such that failed tasks occur before virgin tasks, and the failed
tasks are sorted by when they failed (the ones that failed later occur earlier in the list),
and not by input size. Thus the highest priority is for a failed task that failed most recently.
- This list can be singly linked, as we insert and delete at the head. 
- We can have a single hashmap that keeps tracks of lists for hosts or racks. The key for
the hashmap is a unique name of the host or rack. By using names similar to absolute paths
of files in directories, this structure can support multiple hierarchies of racks. 

1b. Another implementation option is a 2-D sparse matrix as Owen suggested earlier. Only difference
between this structure and the one proposed in 1a is that tasks are additionally connected
to other tasks in their column. This makes it easy to delete all tasks related to the same
data input. When a task is selected to run, we can remove all other related tasks (tasks in
the same column) very easily. 

2. For running tasks (call this the running list), we have the following options: 

2a. Keep a global list of running tasks and walk through it to pick a speculative task, much
as we do today (the only difference being that currently, we walk through an array of all
tasks, whereas with this approach, we walk through a list of only running tasks). 
- When a tasks starts to run, it is removed from the runnable list to a running list, and
can be inserted at the head or tail. This is O(1). 
- When a task completes (or fails), we need to locate it in this list and remove it. 
- Perhaps the best way to implement this is for each TIP object to have a prev and next pointer.
That makes location of a task in this list O(1), and removal is O(1) too as the list is doubly
linked. 
- To figure out a speculative task, we walk through this list as before and pick the first
task that works (we have to add options for rack awareness to the current algorithm). This
is not O(1), but is no worse off than what we have today. 

2b. Use a 2-D sparse matrix, as described earlier, to represent possible speculative tasks.

- When a task is selected to run from the runnable list, copies of it are inserted into the
running list for each host where the task can be speculated (one for each host where the input
is replicated, at the very least). Insertion into the matrix is O(1) if you have doubly linked
lists for rows, and if you have an array, indexed by task ID, that points to the head of the
column. 
- When a task completes (or fails), it is removed from this list, as are all tasks linked
to it in its column. This takes constant time. 
- Finding a speculative task is O(1), as you pick the first task in the running list for the
host, or for its rack. If nothing matches, pick a task from any arbitrary host. 

As far as implementation recommendations go, it makes sense to implement 1a and 2a now. Both
use existing code and do not have a lot of changes to make. Picking a speculative task will
be slow and not locality aware, but it will be no worse than what we have today, and implementation
will be quick. At some time, depending on need, we can implement 1b and 2b, both of which
refer to the same data structure. This requires more coding and testing, but will make the
process of finding a speculative task faster. 


> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.17.0
>
>         Attachments: hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap space limit).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message