Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 42276 invoked from network); 29 Sep 2008 08:59:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Sep 2008 08:59:12 -0000 Received: (qmail 58171 invoked by uid 500); 29 Sep 2008 08:59:04 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 58153 invoked by uid 500); 29 Sep 2008 08:59:04 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 58134 invoked by uid 99); 29 Sep 2008 08:59:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2008 01:59:04 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2008 08:58:10 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 91365234C1F7 for ; Mon, 29 Sep 2008 01:58:44 -0700 (PDT) Message-ID: <326206308.1222678724593.JavaMail.jira@brutus> Date: Mon, 29 Sep 2008 01:58:44 -0700 (PDT) From: "Amar Kamat (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4261) Jobs failing in the init stage will never cleanup In-Reply-To: <1357171987.1222255004196.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635317#action_12635317 ] Amar Kamat commented on HADOOP-4261: ------------------------------------ Few comments w.r.t job-recovery 1) Upon restart, the task-completion-events/task-reports for the setup tasks should also match. 2) It would make more sense to call the job run-state as {{SETUP}} when {{logInited()}} is invoked. While recovering, check if the SETUP state is reached before calling {{init()}}. 3) Check if {{JobInProgress.obtainSetupTask()}} can reuse {{JobInProgress.addRunningTaskToTIP()}}. 4) I think {{JobInProgress.canLaunchSetupTask()}} can also be written as {code} private synchronized boolean canLaunchSetupTask() { // check if the job is in PREP, initialized and not setup return status.getRunState() == JobStatus.PREP && tasksInited.get() && !launchedSetup; } {code} 5) I dont see any code that deals with setup task in job-recovery i.e recovery-manager. Just make sure that the effect of scheduling setup tasks before restart is same as the effect of replaying it from history. I assume that when the JIP is given a task-attempt update, it figures out if the task if setup or not. Ideally the way setup is launched from a recvory-manager should mimic the way its invoked from the real(live) jobtracker. > Jobs failing in the init stage will never cleanup > ------------------------------------------------- > > Key: HADOOP-4261 > URL: https://issues.apache.org/jira/browse/HADOOP-4261 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Reporter: Amar Kamat > Assignee: Amareshwari Sriramadasu > Priority: Blocker > Fix For: 0.19.0 > > Attachments: patch-4261.txt > > > Pre HADOOP-3150, if the job fails in the init stage, {{job.kill()}} was called. This used to make sure that the job was cleaned up w.r.t > - staus set to KILLED/FAILED > - job files from the system dir are deleted > - closing of job history files > - making jobtracker aware of this through {{jobTracker.finalizeJob()}} > - cleaning up the data structures via {{JobInProgress.garbageCollect()}} > Now if the job fails in the init stage, {{job.fail()}} is called which doesnt do the cleanup. HADOOP-3150 introduces cleanup tasks which are launched once the job completes i.e killed/failed/succeeded. Jobtracker will never consider this job for scheduling as the job will be in the {{PREP}} state forever. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.