Return-Path: X-Original-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 822E476C7 for ; Fri, 14 Oct 2011 09:00:40 +0000 (UTC) Received: (qmail 7235 invoked by uid 500); 14 Oct 2011 09:00:37 -0000 Delivered-To: apmail-hadoop-mapreduce-dev-archive@hadoop.apache.org Received: (qmail 7106 invoked by uid 500); 14 Oct 2011 09:00:37 -0000 Mailing-List: contact mapreduce-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-dev@hadoop.apache.org Delivered-To: mailing list mapreduce-dev@hadoop.apache.org Received: (qmail 6457 invoked by uid 99); 14 Oct 2011 09:00:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Oct 2011 09:00:36 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Oct 2011 09:00:33 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B526A3077B6 for ; Fri, 14 Oct 2011 09:00:12 +0000 (UTC) Date: Fri, 14 Oct 2011 09:00:12 +0000 (UTC) From: "Ramgopal N (Created) (JIRA)" To: mapreduce-dev@hadoop.apache.org Message-ID: <2009194771.13272.1318582812743.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (MAPREDUCE-3186) User jobs are getting hanged if the Resource manager process goes down and comes up while job is getting executed. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org User jobs are getting hanged if the Resource manager process goes down and comes up while job is getting executed. ------------------------------------------------------------------------------------------------------------------ Key: MAPREDUCE-3186 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3186 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Environment: linux Reporter: Ramgopal N If the resource manager is restarted while the job execution is in progress, the job is getting hanged. UI shows the job as running. In the RM log, it is throwing an error "ERROR org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AppAttemptId doesnt exist in cache appattempt_1318579738195_0004_000001" In the console MRAppMaster and Runjar processes are not getting killed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira