hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "rahul k singh (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5794) Sometimes job does not get removed from scheduler queue after it is killed
Date Fri, 22 May 2009 10:16:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712013#action_12712013
] 

rahul k singh commented on HADOOP-5794:
---------------------------------------

Analysis of the problem:
When the job tracker is restarted , RecoveryManager tries to recover the job from job history.RecoveryMaanger
instantiates the JobInProgress object and sets its startTime as System.currentTimeMillis.In
JobInProgress constructor JobStatus startTime is set as JIP's startTime .RecoveryManager fetches
startTime information from job history and updates the JIP's startTime(remember this change
is not propagated to JobStatus startTime) , hence now Jobstatus has old value of startTime
. These Job statuses are used in JobQueuesManager to categorize jobs based on the state they
are in. The data structure in JobQueuesManager(waitingJobs) uses startTime as the comparator.As
waitingJobs has old startTime value , it has the old entry.
Whenever we try to do "hadoop job -list" JobTracker's getJobStatus method is called , this
sets the JobStatus startTime value with JobInProgress startTime value , now at this point
, startTime values in JIP and JobStatus are consistent, but the startTime value in waitingJobs
in JobQueueManager is stale . Hence when we try to remove the jobs which are completed(Completed/killed/failed
, for example issueing "hadoop job -kill <>" command ) from waitingJobs() nothing is
removed as comparator startTime is changed.

> Sometimes job does not get removed from scheduler queue after it is killed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-5794
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5794
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.20.0
>            Reporter: Karam Singh
>
> Sometimes when we kill a job, it does get removed from waiting queue, while job status:
"Killed" with Job Setup and Cleanup: "Successful" 
> Also JobTracker webui shows job under failed jobs lists and hadoop job -list all, hadoop
queue <queuename> -showJobs also shows jobs state=5.
> Prior to killing job state was "Running"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message