Return-Path: X-Original-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BCC2A9A9D for ; Fri, 3 Feb 2012 21:17:17 +0000 (UTC) Received: (qmail 49064 invoked by uid 500); 3 Feb 2012 21:17:16 -0000 Delivered-To: apmail-hadoop-mapreduce-dev-archive@hadoop.apache.org Received: (qmail 48338 invoked by uid 500); 3 Feb 2012 21:17:15 -0000 Mailing-List: contact mapreduce-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-dev@hadoop.apache.org Delivered-To: mailing list mapreduce-dev@hadoop.apache.org Received: (qmail 48330 invoked by uid 99); 3 Feb 2012 21:17:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Feb 2012 21:17:15 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Feb 2012 21:17:13 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C16A318BEBD for ; Fri, 3 Feb 2012 21:16:53 +0000 (UTC) Date: Fri, 3 Feb 2012 21:16:53 +0000 (UTC) From: "Robert Joseph Evans (Created) (JIRA)" To: mapreduce-dev@hadoop.apache.org Message-ID: <382930710.9052.1328303813793.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (MAPREDUCE-3802) If an MR AM dies twice it looks like the process freezes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 If an MR AM dies twice it looks like the process freezes --------------------------------------------------------- Key: MAPREDUCE-3802 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3802 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.1, 0.24.0 Reporter: Robert Joseph Evans It looks like recovering from an RM AM dieing works very well on a single failure. But if it fails multiple times we appear to get into a live lock situation. {noformat} yarn jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*-SNAPSHOT.jar wordcount -Dyarn.app.mapreduce.am.log.level=DEBUG -Dmapreduce.job.reduces=30 input output 12/02/03 21:06:57 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 12/02/03 21:06:57 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used 12/02/03 21:06:57 INFO input.FileInputFormat: Total input paths to process : 17 12/02/03 21:06:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/02/03 21:06:57 WARN snappy.LoadSnappy: Snappy native library not loaded 12/02/03 21:06:57 INFO mapreduce.JobSubmitter: number of splits:17 12/02/03 21:06:57 INFO mapred.ResourceMgrDelegate: Submitted application application_1328302034486_0003 to ResourceManager at HOST/IP:8040 12/02/03 21:06:57 INFO mapreduce.Job: The url to track the job: http://HOST:8088/proxy/application_1328302034486_0003/ 12/02/03 21:06:57 INFO mapreduce.Job: Running job: job_1328302034486_0003 12/02/03 21:07:03 INFO mapreduce.Job: Job job_1328302034486_0003 running in uber mode : false 12/02/03 21:07:03 INFO mapreduce.Job: map 0% reduce 0% 12/02/03 21:07:09 INFO mapreduce.Job: map 5% reduce 0% 12/02/03 21:07:10 INFO mapreduce.Job: map 17% reduce 0% #KILLED AM with kill -9 here 12/02/03 21:07:16 INFO mapreduce.Job: map 29% reduce 0% 12/02/03 21:07:17 INFO mapreduce.Job: map 35% reduce 0% 12/02/03 21:07:30 INFO mapreduce.Job: map 52% reduce 0% 12/02/03 21:07:35 INFO mapreduce.Job: map 58% reduce 0% 12/02/03 21:07:37 INFO mapreduce.Job: map 70% reduce 0% 12/02/03 21:07:41 INFO mapreduce.Job: map 76% reduce 0% 12/02/03 21:07:43 INFO mapreduce.Job: map 82% reduce 0% 12/02/03 21:07:44 INFO mapreduce.Job: map 88% reduce 0% 12/02/03 21:07:47 INFO mapreduce.Job: map 94% reduce 0% 12/02/03 21:07:49 INFO mapreduce.Job: map 100% reduce 0% 12/02/03 21:07:53 INFO mapreduce.Job: map 100% reduce 3% 12/02/03 21:08:00 INFO mapreduce.Job: map 100% reduce 6% 12/02/03 21:08:06 INFO mapreduce.Job: map 100% reduce 10% 12/02/03 21:08:12 INFO mapreduce.Job: map 100% reduce 13% 12/02/03 21:08:18 INFO mapreduce.Job: map 100% reduce 16% #killed AM with kill -9 here 12/02/03 21:08:20 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. Already tried 0 time(s). 12/02/03 21:08:21 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. Already tried 1 time(s). 12/02/03 21:08:22 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. Already tried 2 time(s). 12/02/03 21:08:26 INFO mapreduce.Job: map 64% reduce 16% #It never makes any more progress... {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira