hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2119) JobTracker becomes non-responsive if the task trackers finish task too fast
Date Tue, 11 Mar 2008 14:16:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577454#action_12577454
] 

Amar Kamat commented on HADOOP-2119:
------------------------------------

With a similar approach as discussed above and some optimizations (one of which is that the
batching (task commit) now is in stages i.e *batch-size* tips from the queue get batch committed
in one go) we could process large number of maps successfully.
The job description is as follows
1) 250 nodes
2) random-writer modified to do the following : map data goes to the local filesystem and
reducers do nothing.
3) num maps : 3,20,000
4) num reducers : 450
5) bytes per map : 7mb
6) total data : 2.5 TB
7) batch commit size = 5000 i.e at a time only 5000 tips are committed
The map phase took approx 40 min. 
The only problem is that of the reducer-scheduling from the JT. The maps finish so fast that
the map load is always low and the reducers always start after the maps are done. Simple tricks
of increasing the  number of _task completion events_,  _jetty threads_ etc might help but
wont provide a scalable solution. So it seems that tweaking the load logic in the JT i.e {{getNewTaskForTaskTracker()}}
is the only way. We are currently trying lots of optimizations and will post a stable/final
version of the approach along with a patch soon.

> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.17.0
>
>         Attachments: hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap space limit).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message