hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod K V (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5285) JobTracker hangs for long periods of time
Date Thu, 19 Feb 2009 12:20:01 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674978#action_12674978
] 

Vinod K V commented on HADOOP-5285:
-----------------------------------

In an offline discussion with Devaraj, Amareshwari and Hemanth, the suggested fix is to make
JobInProgress.obtainTaskCleanupTask() similar to other methods like JobInProgress.obtainJobSetupTask()
and not lock on JobInProgress if the job is yet not inited. Other suggestion was to move all
the DFS operations in JT that might result in locking of JT into a separate thread; at present
the only one operation that needs to be moved is job cleanup from JobTracker.finalizeJob().

> JobTracker hangs for long periods of time
> -----------------------------------------
>
>                 Key: HADOOP-5285
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5285
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Vinod K V
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> On one of the larger clusters of 2000 nodes, JT hanged quite often, sometimes for times
in the order of 10-15 minutes and once for one and a half hours(!). The stack trace shows
that JobInProgress.obtainTaskCleanupTask() is waiting for lock on JobInProgress object which
JobInProgress.initTasks() is holding for a long time waiting for DFS operations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message