hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srikanth Kakani (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2119) JobTracker becomes non-responsive if the task trackers finish task too fast
Date Thu, 29 Nov 2007 02:21:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546533

Srikanth Kakani commented on HADOOP-2119:

After a lot of analysis, we think that most threads on jobtracker get locked on the jobtracker
object so much so that the lock acquisition time becomes 1.8 s or more.

Now looking at the commit thread it has two synchronized(this) blocks and one promotion call
per each promotion.

So really the commit thread instead of doing much useful work waits mostly for lock acquisitions.

The fix is that the commit thread greedily commits as many threads as possible when it has
the lock. This works well both for slower maps and for faster-completing maps.

I will be submitting a patch.

> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>             Fix For: 0.16.0
>         Attachments: hadoop-jobtracker-thread-dump.txt
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap space limit).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message