hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2119) JobTracker becomes non-responsive if the task trackers finish task too fast
Date Thu, 07 Feb 2008 15:12:08 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566626#action_12566626
] 

Amar Kamat commented on HADOOP-2119:
------------------------------------

I will summarize the designs here 
_Notations_
{noformat}
n      : number of TIPs
c      : number of TIPs that can potentially be executed locally.
k      : maximum number of tasks that can be run simultaneously (<<n)
f      : number of failed tasks (<<r)
Pop    : get the first TIP
b      : number of bins of TIPs = n/s (where, s  : max number of TIPs per bin (<k))
{noformat}
The following analysis considers the worst case scenarios.
||#||Algorithm For||Description||General Design Requirements||Time Complexity||Space Complexity||Implementation
Complexity||
|1|Virgin Task|Maintain a running pointer rather than starting from 0.\\ The pointer will
always more forward|Fast Pop and Delete operation|Pop : O(c) (worst case but will happen at
most once)\\No deletions required| O(1)|Low|
|2|Virgin Task|Maintain a doubly linked list using array|Fast Pop and Delete operation|Pop
: O(1)\\Delete : O(1)| O( n )|High|
|3|Failed Task|Maintain a sorted list of TIPs that have failed|Small Size, Less Insert/Pop/Delete|All
O(log(f)) operations|O(f)|Low|
|4|Running Task|Maintain an array of sorted lists (where the array is of size {{b}}).\\Membership
of a TIP to a bin/group is determined by {{TIP.getPartition()/s}}|Large Size\\Frequent Insert/Pop/Delete|Insert
: O(log(s))\\Delete O(log(s))\\Scan : O(k) |O(k + b)|Medium|
|5|Running Task|Maintain an array of lists (where the array is of size {{b}}).\\Membership
of a TIP to a bin/group is determined by {{TIP.getPartition()/s}}| Not sorted\\Large Size\\Frequent
Insert/Delete|Insert : O(1)\\Delete O(s)\\Scan : O(k)  |O(k + b)|Medium|
|6|Running Task|Maintain one sorted list| Large Size\\Frequent Insert/Delete|Insert : O(log(k))\\Delete
: O(log(k))\\Scan : O(k) |O(k)|Simple|

{noformat}
Easiest to implement    :  (1) + (3) + (6)
Most Space efficient    :  (1) + (3) + (6)
Most Time efficient     :  (2) + (3) + (4) (most inefficient w.r.t space and implementation
complexity)
{noformat}

> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.17.0
>
>         Attachments: hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap space limit).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message