Return-Path: X-Original-To: apmail-ambari-issues-archive@minotaur.apache.org Delivered-To: apmail-ambari-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 79A2010640 for ; Fri, 11 Mar 2016 17:05:39 +0000 (UTC) Received: (qmail 9573 invoked by uid 500); 11 Mar 2016 17:05:39 -0000 Delivered-To: apmail-ambari-issues-archive@ambari.apache.org Received: (qmail 9532 invoked by uid 500); 11 Mar 2016 17:05:39 -0000 Mailing-List: contact issues-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ambari.apache.org Delivered-To: mailing list issues@ambari.apache.org Received: (qmail 9508 invoked by uid 99); 11 Mar 2016 17:05:39 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Mar 2016 17:05:39 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 135BB2C1F55 for ; Fri, 11 Mar 2016 17:05:39 +0000 (UTC) Date: Fri, 11 Mar 2016 17:05:39 +0000 (UTC) From: "Dmitry Lysnichenko (JIRA)" To: issues@ambari.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (AMBARI-15389) Intermittent YARN service check failures during and post EU MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AMBARI-15389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lysnichenko updated AMBARI-15389: ---------------------------------------- Attachment: AMBARI-15389.patch > Intermittent YARN service check failures during and post EU > ----------------------------------------------------------- > > Key: AMBARI-15389 > URL: https://issues.apache.org/jira/browse/AMBARI-15389 > Project: Ambari > Issue Type: Bug > Components: ambari-server > Reporter: Dmitry Lysnichenko > Assignee: Dmitry Lysnichenko > Attachments: AMBARI-15389.patch > > > Build # - Ambari 2.2.1.1 - #63 > Observed this issue in a couple of EU runs recently where YARN service check reports failure > a. In one test, the EU ran from HDP 2.3.4.0 to 2.4.0.0 and YARN service check reported failure during EU itself; a retry of the operation led to service check being successful > b. In another test post EU when YARN service check was run, it reported failure; afterwards when I ran it again - success > Looks like there is some corner condition which causes this issue to be hit > {code} > stderr: /var/lib/ambari-agent/data/errors-822.txt > Traceback (most recent call last): > File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py", line 142, in > ServiceCheck().execute() > File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute > method(env) > File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py", line 104, in service_check > user=params.smokeuser, > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner > result = function(command, **kwargs) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call > tries=tries, try_sleep=try_sleep) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper > result = _call(command, **kwargs_copy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call > raise Fail(err_msg) > resource_management.core.exceptions.Fail: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/smokeuser.headless.keytab ambari-qa@EXAMPLE.COM; yarn org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls -num_containers 1 -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar' returned 2. ######## Hortonworks ############# > This is MOTD message, added for testing in qe infra > 16/03/03 02:33:51 INFO impl.TimelineClientImpl: Timeline service address: http://host:8188/ws/v1/timeline/ > 16/03/03 02:33:51 INFO distributedshell.Client: Initializing Client > 16/03/03 02:33:51 INFO distributedshell.Client: Running Client > 16/03/03 02:33:51 INFO client.RMProxy: Connecting to ResourceManager at host-9-5.test/127.0.0.254:8050 > 16/03/03 02:33:53 INFO distributedshell.Client: Got Cluster metric info from ASM, numNodeManagers=3 > 16/03/03 02:33:53 INFO distributedshell.Client: Got Cluster node info from ASM > 16/03/03 02:33:53 INFO distributedshell.Client: Got node report from ASM for, nodeId=host:25454, nodeAddresshost:8042, nodeRackName/default-rack, nodeNumContainers1 > 16/03/03 02:33:53 INFO distributedshell.Client: Got node report from ASM for, nodeId=host-9-5.test:25454, nodeAddresshost-9-5.test:8042, nodeRackName/default-rack, nodeNumContainers0 > 16/03/03 02:33:53 INFO distributedshell.Client: Got node report from ASM for, nodeId=host-9-1.test:25454, nodeAddresshost-9-1.test:8042, nodeRackName/default-rack, nodeNumContainers0 > 16/03/03 02:33:53 INFO distributedshell.Client: Queue info, queueName=default, queueCurrentCapacity=0.083333336, queueMaxCapacity=1.0, queueApplicationCount=0, queueChildQueueCount=0 > 16/03/03 02:33:53 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=SUBMIT_APPLICATIONS > 16/03/03 02:33:53 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS > 16/03/03 02:33:53 INFO distributedshell.Client: Max mem capabililty of resources in this cluster 10240 > 16/03/03 02:33:53 INFO distributedshell.Client: Max virtual cores capabililty of resources in this cluster 1 > 16/03/03 02:33:53 INFO distributedshell.Client: Copy App Master jar from local filesystem and add to local environment > 16/03/03 02:33:53 INFO distributedshell.Client: Set the environment for the application master > 16/03/03 02:33:53 INFO distributedshell.Client: Setting up app master command > 16/03/03 02:33:53 INFO distributedshell.Client: Completed setting up app master command {{JAVA_HOME}}/bin/java -Xmx10m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --container_vcores 1 --num_containers 1 --priority 0 1>/AppMaster.stdout 2>/AppMaster.stderr > 16/03/03 02:33:53 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 290 for ambari-qa on 127.0.0.235:8020 > 16/03/03 02:33:53 INFO distributedshell.Client: Got dt for hdfs://host-9-1.test:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 127.0.0.235:8020, Ident: (HDFS_DELEGATION_TOKEN token 290 for ambari-qa) > 16/03/03 02:33:53 INFO distributedshell.Client: Submitting application to ASM > 16/03/03 02:33:54 INFO impl.YarnClientImpl: Submitted application application_1456970141888_0011 > 16/03/03 02:33:55 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa > 16/03/03 02:33:56 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa > 16/03/03 02:33:57 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa > 16/03/03 02:33:58 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa > 16/03/03 02:33:59 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa > 16/03/03 02:34:00 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa > 16/03/03 02:34:01 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa > 16/03/03 02:34:02 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa > 16/03/03 02:34:03 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa > 16/03/03 02:34:04 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa > 16/03/03 02:34:05 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa > 16/03/03 02:34:06 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa > 16/03/03 02:34:07 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa > 16/03/03 02:34:08 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=FINISHED, distributedFinalState=FAILED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa > 16/03/03 02:34:08 INFO distributedshell.Client: Application did finished unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring loop > 16/03/03 02:34:08 ERROR distributedshell.Client: Application failed to complete successfully > stdout: /var/lib/ambari-agent/data/output-822.txt > 2016-03-03 02:33:47,974 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf > 2016-03-03 02:33:48,013 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf > 2016-03-03 02:33:48,018 - checked_call['/usr/bin/kinit -kt /etc/security/keytabs/smokeuser.headless.keytab ambari-qa@EXAMPLE.COM; yarn org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls -num_containers 1 -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar'] {'path': '/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin', 'user': 'ambari-qa'} > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)