hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
Date Mon, 09 Feb 2015 22:24:35 GMT

    [ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313010#comment-14313010
] 

Hadoop QA commented on YARN-933:
--------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12697404/0001-YARN-933.patch
  against trunk revision fcad031.

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified
test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6555//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6555//console

This message is automatically generated.

> Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-933
>                 URL: https://issues.apache.org/jira/browse/YARN-933
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.5-alpha
>            Reporter: J.Andreina
>            Assignee: Rohith
>         Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, YARN-933.3.patch, YARN-933.patch
>
>
> am max retries configured as 3 at client and RM side.
> Step 1: Install cluster with NM on 2 Machines 
> Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname
should fail
> Step 3: Execute a job
> Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss
happened.
> Observation :
> ==========
> After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and
Application removal are successful. New AppAttempt_2 is sponed.
> 1. Then again retry for AppAttempt_1 happens.
> 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException
> 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running
], while the appattempts configured is 3 and rest appattempts are all sponed and running.
> RMLogs:
> ======
> 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1373952096466_0056_000001 State change from SCHEDULED to ALLOCATED
> 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45
> 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_000001
Timed out after 600 secs
> 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1373952096466_0056_01_000001 Container Transitioned from ACQUIRED to EXPIRED
> 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Registering appattempt_1373952096466_0056_000002
> 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Application appattempt_1373952096466_0056_000001 is done. finalState=FAILED
> 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent:
root #applications: 35
> 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Application Submission: appattempt_1373952096466_0056_000002, 
> 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1373952096466_0056_000002 State change from SUBMITTED to SCHEDULED
> 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45
> 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45
> 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
Error launching appattempt_1373952096466_0056_000001. Got exception: java.lang.reflect.UndeclaredThrowableException
> 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED
at FAILED
>  at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>  at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
>  at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630)
>  at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
>  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495)
>  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:476)
>  at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
>  at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
>  at java.lang.Thread.run(Thread.java:662)
> Client Logs
> ========
> Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while
waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending
remote=host-10-18-40-15/10.18.40.59:8020]
>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:573)
>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
> 2013-07-17 16:37:05,987 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:Rex (auth:SIMPLE) cause:org.apache.hadoop.net.ConnectTimeoutException: Call From HOST-10-18-91-55/10.18.40.57
to host-10-18-40-15:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException:
20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending
remote=host-10-18-40-15/10.18.40.59:8020]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message