Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 47088 invoked from network); 10 Jun 2008 07:58:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Jun 2008 07:58:08 -0000 Received: (qmail 88445 invoked by uid 500); 10 Jun 2008 07:58:09 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 88130 invoked by uid 500); 10 Jun 2008 07:58:08 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 88119 invoked by uid 99); 10 Jun 2008 07:58:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jun 2008 00:58:08 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jun 2008 07:57:27 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 2CF8E234C138 for ; Tue, 10 Jun 2008 00:57:45 -0700 (PDT) Message-ID: <1225928705.1213084665183.JavaMail.jira@brutus> Date: Tue, 10 Jun 2008 00:57:45 -0700 (PDT) From: "Hemanth Yamijala (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-3523) [HOD] If a job does not exist in Torque's list of jobs, HOD allocate on previously allocated directory fails. In-Reply-To: <1054021920.1213072664974.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603796#action_12603796 ] Hemanth Yamijala commented on HADOOP-3523: ------------------------------------------ Just FYI, one of the ways in which we can quickly get a job id from disappearing from a torque server is to restart it. So, can easily be simulated on a test cluster. Another way is to set the torque server configuration keep_completed to a very, very small value or 0. > [HOD] If a job does not exist in Torque's list of jobs, HOD allocate on previously allocated directory fails. > ------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-3523 > URL: https://issues.apache.org/jira/browse/HADOOP-3523 > Project: Hadoop Core > Issue Type: Bug > Components: contrib/hod > Affects Versions: 0.18.0 > Reporter: Hemanth Yamijala > Assignee: Hemanth Yamijala > Priority: Blocker > Fix For: 0.18.0 > > Attachments: 3523.patch > > > HADOOP-3483 addressed the issue where a dead cluster could be reallocated without having to issue warnings to users to clean up the directory themselves, provided the job is completed. It missed one case, where the job no longer exists in the Torque queue. When tried in that case, HOD fails with a bad error message: > ERROR - qstat error: exit code: 153 | signal: False | core False > CRITICAL - op: allocate hod-clusters/test 3 failed: 'NoneType' object is unsubscriptable > This should be addressed to avoid user concerns. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.