ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-15389) Intermittent YARN service check failures during and post EU
Date Fri, 11 Mar 2016 19:00:39 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-15389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191392#comment-15191392
] 

Hudson commented on AMBARI-15389:
---------------------------------

SUCCESS: Integrated in Ambari-branch-2.2 #502 (See [https://builds.apache.org/job/Ambari-branch-2.2/502/])
AMBARI-15389 Intermittent YARN service check failures during and post EU (dlysnichenko: [http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=77dc81f63a830a8fe9a99284d31dcfec98ac98d3])
* ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py


> Intermittent YARN service check failures during and post EU
> -----------------------------------------------------------
>
>                 Key: AMBARI-15389
>                 URL: https://issues.apache.org/jira/browse/AMBARI-15389
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.2.2
>            Reporter: Dmitry Lysnichenko
>            Assignee: Dmitry Lysnichenko
>             Fix For: 2.2.2
>
>         Attachments: AMBARI-15389.patch
>
>
> Build # - Ambari 2.2.1.1 - #63
> Observed this issue in a couple of EU runs recently where YARN service check reports
failure
> a. In one test, the EU ran from HDP 2.3.4.0 to 2.4.0.0 and YARN service check reported
failure during EU itself; a retry of the operation led to service check being successful
> b. In another test post EU when YARN service check was run, it reported failure; afterwards
when I ran it again - success
> Looks like there is some corner condition which causes this issue to be hit
> {code}
> stderr:   /var/lib/ambari-agent/data/errors-822.txt
> Traceback (most recent call last):
> File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py",
line 142, in <module>
> ServiceCheck().execute()
> File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 219, in execute
> method(env)
> File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py",
line 104, in service_check
> user=params.smokeuser,
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in
inner
> result = function(command, **kwargs)
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in
checked_call
> tries=tries, try_sleep=try_sleep)
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140,
in _call_wrapper
> result = _call(command, **kwargs_copy)
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291,
in _call
> raise Fail(err_msg)
> resource_management.core.exceptions.Fail: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/smokeuser.headless.keytab
ambari-qa@EXAMPLE.COM; yarn org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command
ls -num_containers 1 -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar'
returned 2. ######## Hortonworks #############
> This is MOTD message, added for testing in qe infra
> 16/03/03 02:33:51 INFO impl.TimelineClientImpl: Timeline service address: http://host:8188/ws/v1/timeline/
> 16/03/03 02:33:51 INFO distributedshell.Client: Initializing Client
> 16/03/03 02:33:51 INFO distributedshell.Client: Running Client
> 16/03/03 02:33:51 INFO client.RMProxy: Connecting to ResourceManager at host-9-5.test/127.0.0.254:8050
> 16/03/03 02:33:53 INFO distributedshell.Client: Got Cluster metric info from ASM, numNodeManagers=3
> 16/03/03 02:33:53 INFO distributedshell.Client: Got Cluster node info from ASM
> 16/03/03 02:33:53 INFO distributedshell.Client: Got node report from ASM for, nodeId=host:25454,
nodeAddresshost:8042, nodeRackName/default-rack, nodeNumContainers1
> 16/03/03 02:33:53 INFO distributedshell.Client: Got node report from ASM for, nodeId=host-9-5.test:25454,
nodeAddresshost-9-5.test:8042, nodeRackName/default-rack, nodeNumContainers0
> 16/03/03 02:33:53 INFO distributedshell.Client: Got node report from ASM for, nodeId=host-9-1.test:25454,
nodeAddresshost-9-1.test:8042, nodeRackName/default-rack, nodeNumContainers0
> 16/03/03 02:33:53 INFO distributedshell.Client: Queue info, queueName=default, queueCurrentCapacity=0.083333336,
queueMaxCapacity=1.0, queueApplicationCount=0, queueChildQueueCount=0
> 16/03/03 02:33:53 INFO distributedshell.Client: User ACL Info for Queue, queueName=root,
userAcl=SUBMIT_APPLICATIONS
> 16/03/03 02:33:53 INFO distributedshell.Client: User ACL Info for Queue, queueName=default,
userAcl=SUBMIT_APPLICATIONS
> 16/03/03 02:33:53 INFO distributedshell.Client: Max mem capabililty of resources in this
cluster 10240
> 16/03/03 02:33:53 INFO distributedshell.Client: Max virtual cores capabililty of resources
in this cluster 1
> 16/03/03 02:33:53 INFO distributedshell.Client: Copy App Master jar from local filesystem
and add to local environment
> 16/03/03 02:33:53 INFO distributedshell.Client: Set the environment for the application
master
> 16/03/03 02:33:53 INFO distributedshell.Client: Setting up app master command
> 16/03/03 02:33:53 INFO distributedshell.Client: Completed setting up app master command
{{JAVA_HOME}}/bin/java -Xmx10m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster
--container_memory 10 --container_vcores 1 --num_containers 1 --priority 0 1><LOG_DIR>/AppMaster.stdout
2><LOG_DIR>/AppMaster.stderr
> 16/03/03 02:33:53 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 290 for ambari-qa
on 127.0.0.235:8020
> 16/03/03 02:33:53 INFO distributedshell.Client: Got dt for hdfs://host-9-1.test:8020;
Kind: HDFS_DELEGATION_TOKEN, Service: 127.0.0.235:8020, Ident: (HDFS_DELEGATION_TOKEN token
290 for ambari-qa)
> 16/03/03 02:33:53 INFO distributedshell.Client: Submitting application to ASM
> 16/03/03 02:33:54 INFO impl.YarnClientImpl: Submitted application application_1456970141888_0011
> 16/03/03 02:33:55 INFO distributedshell.Client: Got application report from ASM for,
appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, appDiagnostics=, appMasterHost=N/A,
appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED,
distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/,
appUser=ambari-qa
> 16/03/03 02:33:56 INFO distributedshell.Client: Got application report from ASM for,
appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, appDiagnostics=, appMasterHost=N/A,
appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED,
distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/,
appUser=ambari-qa
> 16/03/03 02:33:57 INFO distributedshell.Client: Got application report from ASM for,
appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, appDiagnostics=, appMasterHost=N/A,
appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED,
distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/,
appUser=ambari-qa
> 16/03/03 02:33:58 INFO distributedshell.Client: Got application report from ASM for,
appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, appDiagnostics=, appMasterHost=N/A,
appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED,
distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/,
appUser=ambari-qa
> 16/03/03 02:33:59 INFO distributedshell.Client: Got application report from ASM for,
appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, appDiagnostics=, appMasterHost=N/A,
appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED,
distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/,
appUser=ambari-qa
> 16/03/03 02:34:00 INFO distributedshell.Client: Got application report from ASM for,
appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, appDiagnostics=, appMasterHost=N/A,
appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED,
distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/,
appUser=ambari-qa
> 16/03/03 02:34:01 INFO distributedshell.Client: Got application report from ASM for,
appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, appDiagnostics=, appMasterHost=N/A,
appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED,
distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/,
appUser=ambari-qa
> 16/03/03 02:34:02 INFO distributedshell.Client: Got application report from ASM for,
appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, appDiagnostics=, appMasterHost=N/A,
appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED,
distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/,
appUser=ambari-qa
> 16/03/03 02:34:03 INFO distributedshell.Client: Got application report from ASM for,
appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, appDiagnostics=, appMasterHost=N/A,
appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED,
distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/,
appUser=ambari-qa
> 16/03/03 02:34:04 INFO distributedshell.Client: Got application report from ASM for,
appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, appDiagnostics=, appMasterHost=host-9-1/127.0.0.235,
appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, distributedFinalState=UNDEFINED,
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa
> 16/03/03 02:34:05 INFO distributedshell.Client: Got application report from ASM for,
appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, appDiagnostics=, appMasterHost=host-9-1/127.0.0.235,
appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, distributedFinalState=UNDEFINED,
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa
> 16/03/03 02:34:06 INFO distributedshell.Client: Got application report from ASM for,
appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, appDiagnostics=, appMasterHost=host-9-1/127.0.0.235,
appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, distributedFinalState=UNDEFINED,
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa
> 16/03/03 02:34:07 INFO distributedshell.Client: Got application report from ASM for,
appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, appDiagnostics=, appMasterHost=host-9-1/127.0.0.235,
appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, distributedFinalState=UNDEFINED,
appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa
> 16/03/03 02:34:08 INFO distributedshell.Client: Got application report from ASM for,
appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service:  }, appDiagnostics=, appMasterHost=host-9-1/127.0.0.235,
appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=FINISHED,
distributedFinalState=FAILED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/,
appUser=ambari-qa
> 16/03/03 02:34:08 INFO distributedshell.Client: Application did finished unsuccessfully.
YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring loop
> 16/03/03 02:34:08 ERROR distributedshell.Client: Application failed to complete successfully
> stdout:   /var/lib/ambari-agent/data/output-822.txt
> 2016-03-03 02:33:47,974 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
> 2016-03-03 02:33:48,013 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
> 2016-03-03 02:33:48,018 - checked_call['/usr/bin/kinit -kt /etc/security/keytabs/smokeuser.headless.keytab
ambari-qa@EXAMPLE.COM; yarn org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command
ls -num_containers 1 -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar']
{'path': '/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin', 'user': 'ambari-qa'}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message