Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D145910292 for ; Fri, 20 Sep 2013 10:58:14 +0000 (UTC) Received: (qmail 83223 invoked by uid 500); 20 Sep 2013 10:58:05 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 83029 invoked by uid 500); 20 Sep 2013 10:58:03 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 82967 invoked by uid 99); 20 Sep 2013 10:58:02 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Sep 2013 10:58:02 +0000 Date: Fri, 20 Sep 2013 10:58:01 +0000 (UTC) From: "Hudson (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-5488) Job recovery fails after killing all the running containers for the app MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772935#comment-13772935 ] Hudson commented on MAPREDUCE-5488: ----------------------------------- SUCCESS: Integrated in Hadoop-Yarn-trunk #338 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/338/]) MAPREDUCE-5488. Changed MR client to keep trying to reach the application when it sees that on attempt's AM is down. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524856) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java > Job recovery fails after killing all the running containers for the app > ----------------------------------------------------------------------- > > Key: MAPREDUCE-5488 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5488 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 2.1.0-beta > Reporter: Arpit Gupta > Assignee: Jian He > Attachments: MAPREDUCE-5488.1.patch, MAPREDUCE-5488.2.patch, MAPREDUCE-5488.3.patch, MAPREDUCE-5488.patch, MAPREDUCE-5488.patch, MAPREDUCE-5488.patch > > > Here is the client stack trace > {code} > RUNNING: /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.1.0.2.0.5.0-66.jar wordcount "-Dmapreduce.reduce.input.limit=-1" /user/user/test_yarn_ha/medium_wordcount_input /user/hrt_qa/test_yarn_ha/test_mapred_ha_single_job_applicationmaster-1-time > 13/08/30 08:45:39 INFO client.RMProxy: Connecting to ResourceManager at hostname/68.142.247.148:8032 > 13/08/30 08:45:40 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 19 for user on ha-hdfs:ha-2-secure > 13/08/30 08:45:40 INFO security.TokenCache: Got dt for hdfs://ha-2-secure; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ha-2-secure, Ident: (HDFS_DELEGATION_TOKEN token 19 for user) > 13/08/30 08:45:40 INFO input.FileInputFormat: Total input paths to process : 20 > 13/08/30 08:45:40 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library > 13/08/30 08:45:40 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3] > 13/08/30 08:45:40 INFO mapreduce.JobSubmitter: number of splits:180 > 13/08/30 08:45:40 WARN conf.Configuration: user.name is deprecated. Instead, use mapreduce.job.user.name > 13/08/30 08:45:40 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir > 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1377851032086_0003 > 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ha-2-secure, Ident: (HDFS_DELEGATION_TOKEN token 19 for user) > 13/08/30 08:45:42 INFO impl.YarnClientImpl: Submitted application application_1377851032086_0003 to ResourceManager at hostname/68.142.247.148:8032 > 13/08/30 08:45:42 INFO mapreduce.Job: The url to track the job: http://hostname:8088/proxy/application_1377851032086_0003/ > 13/08/30 08:45:42 INFO mapreduce.Job: Running job: job_1377851032086_0003 > 13/08/30 08:45:48 INFO mapreduce.Job: Job job_1377851032086_0003 running in uber mode : false > 13/08/30 08:45:48 INFO mapreduce.Job: map 0% reduce 0% > stop applicationmaster > beaver.component.hadoop|INFO|Kill container container_1377851032086_0003_01_000001 on host hostname > RUNNING: ssh -o StrictHostKeyChecking=no hostname "sudo su - -c \"ps aux | grep container_1377851032086_0003_01_000001 | awk '{print \\\$2}' | xargs kill -9\" root" > Warning: Permanently added 'hostname,68.142.247.155' (RSA) to the list of known hosts. > kill 8978: No such process > waiting for down time 10 seconds for service applicationmaster > 13/08/30 08:45:55 INFO ipc.Client: Retrying connect to server: hostname/68.142.247.155:52713. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS) > 13/08/30 08:45:56 INFO ipc.Client: Retrying connect to server: hostname/68.142.247.155:52713. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS) > 13/08/30 08:45:56 ERROR security.UserGroupInformation: PriviledgedActionException as:user@REALM (auth:KERBEROS) cause:java.io.IOException: java.net.ConnectException: Call From hostname.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused > java.io.IOException: java.net.ConnectException: Call From hostname.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused > at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:319) > at org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:354) > at org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:529) > at org.apache.hadoop.mapreduce.Job$5.run(Job.java:668) > at org.apache.hadoop.mapreduce.Job$5.run(Job.java:665) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) > at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:665) > at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1349) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289) > at org.apache.hadoop.examples.WordCount.main(WordCount.java:84) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > Caused by: java.net.ConnectException: Call From hostname.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at $Proxy14.getTaskAttemptCompletionEvents(Unknown Source) > at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBClientImpl.java:177) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:310) > ... 23 more > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) > at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) > at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642) > at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399) > at org.apache.hadoop.ipc.Client.call(Client.java:1318) > ... 32 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira