hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yesha Vora (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-8579) New AM attempt could not retrieve previous attempt component data
Date Wed, 25 Jul 2018 21:28:00 GMT
Yesha Vora created YARN-8579:

             Summary: New AM attempt could not retrieve previous attempt component data
                 Key: YARN-8579
                 URL: https://issues.apache.org/jira/browse/YARN-8579
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 3.1.1
            Reporter: Yesha Vora

1) Launch httpd-docker
2) Wait for app to be in STABLE state
3) Run validation for app (It takes around 3 mins)
4) Stop all Zks 
5) Wait 60 sec
6) Kill AM
7) wait for 30 sec
8) Start all ZKs
9) Wait for application to finish
10) Validate expected containers of the app

Expected behavior:
New attempt of AM should start and docker containers launched by 1st attempt should be recovered
by new attempt.

Actual behavior:
New AM attempt starts. It can not recover 1st attempt docker containers. It can not read component
details from ZK. 
Thus, it starts new attempt for all containers.

2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering appattempt_1531977563978_0015_000002,
fault-test-zkrm-httpd-docker into registry
2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 containers from
previous attempt.
2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not read component paths:
`/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': No such file
or directory: KeeperErrorCode = NoNode for /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling container_e08_1531977563978_0015_01_000003
from previous attempt
2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not found in registry
for container container_e08_1531977563978_0015_01_000003 from previous attempt, releasing
2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  impl.TimelineV2ClientImpl - Updated
timeline service address to xxx:33019
2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering initial evaluation
of component httpd
2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT httpd]: 2 instances.
2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] Requesting for
2 container(s){code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message