hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2119) JobTracker becomes non-responsive if the task trackers finish task too fast
Date Thu, 07 Feb 2008 10:25:08 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566541#action_12566541

Amar Kamat commented on HADOOP-2119:

I guess the reason for not prioritizing failed task over virgin tasks is that the probability
of the first not-running task being failed is pretty high as compared to it being virgin and
hence the simplicity. 
I will explain some of the things in detail here.
1) There is one more data structure that can be used for finding the virgin task in {{O(1)}}
time but requires to hold {{2*num-tips}} integers.  Basically to have an array implementation
of the doubly linked list. A pointer of the first element in the list also has to be maintained.
Deletions/Top/Pop are {{O(1)}} and assuming no dynamic insertions. This is similar to the
c/c++ implementations of doubly linked list giving the benefit of indexing the list as arrays
providing {{O(1)}} deletions. If required I will post the implementation of this data structure.
2) Since there will be a lot of insertions and deletions for running TIPs, the data structure
should be optimized for insertions/deletions. Finding a TIP for speculation requires a scan
over the running TIPs (sorted on size) as the progress changes dynamically. One good structure
here will be an array of {{b}} SortedSets. Here {{b = max-possible-running-TIPs / k'}} and
{{k'}} will be compile time constant by default set to 100. So now we hash the partition for
the TIP to one such sortedset and then insert/delete locally in that set.  Something like
// compile time constant, number of elements per bin
k' = 100 
// number of bins
b = num-tips / k'  
SortedSet = Bin[tip.getPartition() % k']
// operate on this tip 
// addition/deletions happen on a smaller set of TIPs
Note that for finding the TIP for speculation we will iterate over all the bins in order.

If we decide not to care about the sortedness then use a simple ArrayList instead of a SortedSet.

Other option would be to use a single list/structure for all the running tasks.

> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.17.0
>         Attachments: hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap space limit).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message