flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Ewen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-1835) Spurious failure of YARN tests
Date Tue, 07 Apr 2015 10:53:13 GMT
Stephan Ewen created FLINK-1835:
-----------------------------------

             Summary: Spurious failure of YARN tests
                 Key: FLINK-1835
                 URL: https://issues.apache.org/jira/browse/FLINK-1835
             Project: Flink
          Issue Type: Bug
          Components: YARN Client
    Affects Versions: 0.9
            Reporter: Stephan Ewen
            Assignee: Robert Metzger
             Fix For: 0.9


THe failure was caused by detecting an exception in the log.

Stack trace of the exception (extracted from the log) below

{code}
21:18:29,555 WARN  org.apache.hadoop.util.NativeCodeLoader                       - Unable
to load native-hadoop library for your platform... using builtin-java classes where applicable
21:18:29,806 INFO  org.apache.flink.yarn.ApplicationMaster$                      - YARN daemon
runs as travis setting user to execute Flink ApplicationMaster/JobManager to travis
21:18:29,808 INFO  org.apache.flink.yarn.ApplicationMaster$                      - --------------------------------------------------------------------------------
21:18:29,809 INFO  org.apache.flink.yarn.ApplicationMaster$                      -  Starting
YARN ApplicationMaster/JobManager (Version: 0.9-SNAPSHOT, Rev:d2020b5, Date:06.04.2015 @ 18:00:21
UTC)
21:18:29,809 INFO  org.apache.flink.yarn.ApplicationMaster$                      -  Current
user: travis
21:18:29,809 INFO  org.apache.flink.yarn.ApplicationMaster$                      -  JVM: Java
HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.31-b07
21:18:29,809 INFO  org.apache.flink.yarn.ApplicationMaster$                      -  Maximum
heap size: 393 MiBytes
21:18:29,826 INFO  org.apache.flink.yarn.ApplicationMaster$                      -  JAVA_HOME:
/usr/lib/jvm/java-8-oracle
21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                      -  JVM Options:
21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                      -     -Xmx409M
21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                      -     -Dlog.file=/home/travis/build/StephanEwen/incubator-flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-1_0/application_1428355034517_0004/container_1428355034517_0004_01_000001/jobmanager-main.log
21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                      -     -Dlogback.configurationFile=file:logback.xml
21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                      -     -Dlog4j.configuration=file:log4j.properties
21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                      -  Program
Arguments: (none)
21:18:29,827 INFO  org.apache.flink.yarn.ApplicationMaster$                      - --------------------------------------------------------------------------------
21:18:29,828 INFO  org.apache.flink.yarn.ApplicationMaster$                      - registered
UNIX signal handlers for [TERM, HUP, INT]
21:18:29,843 INFO  org.apache.flink.yarn.ApplicationMaster$                      - Starting
JobManager for YARN
21:18:29,845 INFO  org.apache.flink.yarn.ApplicationMaster$                      - Loading
config from: /home/travis/build/StephanEwen/incubator-flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-localDir-nm-1_0/usercache/travis/appcache/application_1428355034517_0004/container_1428355034517_0004_01_000001
21:18:30,388 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger
started
21:18:30,450 INFO  Remoting                                                      - Starting
remoting
21:18:30,637 INFO  Remoting                                                      - Remoting
started; listening on addresses :[akka.tcp://flink@172.17.0.176:34023]
21:18:30,651 INFO  org.apache.flink.runtime.blob.BlobServer                      - Created
BLOB server storage directory /tmp/blobStore-e34b86da-094c-4a4e-aa02-7b0556e8af93
21:18:30,655 INFO  org.apache.flink.runtime.blob.BlobServer                      - Started
BLOB server at 0.0.0.0:33717 - max concurrent requests: 50 - max backlog: 1000
21:18:30,670 INFO  org.apache.flink.yarn.ApplicationMaster$                      - Starting
Job Manger web frontend.
21:18:30,673 INFO  org.apache.flink.runtime.jobmanager.web.WebInfoServer         - Setting
up web info server, using web-root directory jar:file:/home/travis/build/StephanEwen/incubator-flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-localDir-nm-1_0/usercache/travis/appcache/application_1428355034517_0004/filecache/12/flink-dist-0.9-SNAPSHOT.jar!/web-docs-infoserver.
21:18:30,705 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting
JobManager at akka://flink/user/jobmanager#395299512.
21:18:31,184 INFO  org.eclipse.jetty.util.log                                    - jetty-0.9-SNAPSHOT
21:18:31,269 INFO  org.eclipse.jetty.util.log                                    - Started
SelectChannelConnector@0.0.0.0:49867
21:18:31,270 INFO  org.apache.flink.runtime.jobmanager.web.WebInfoServer         - Started
web info server for JobManager on 0.0.0.0:49867
21:18:31,270 INFO  org.apache.flink.yarn.ApplicationMaster$                      - Generate
configuration file for application master.
21:18:31,283 INFO  org.apache.flink.yarn.ApplicationMaster$                      - Starting
YARN session on Job Manager.
21:18:31,284 INFO  org.apache.flink.yarn.ApplicationMaster$                      - Application
Master properly initiated. Awaiting termination of actor system.
21:18:31,287 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Start yarn
session.
21:18:31,489 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Requesting
1 TaskManagers. Tolerating 1 failed TaskManagers
21:18:31,815 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting
to ResourceManager at /0.0.0.0:8030
21:18:31,914 INFO  org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy
 - yarn.client.max-cached-nodemanagers-proxies : 0
21:18:31,915 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Registering
ApplicationMaster with tracking url http://testing-worker-linux-docker-2f4f6c00-3426-linux-13.prod.travis-ci.org:49867.
21:18:32,255 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Requesting
initial TaskManager container 0.
21:18:32,283 INFO  org.apache.flink.yarn.Utils                                   - Copying
from file:/home/travis/build/StephanEwen/incubator-flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-localDir-nm-1_0/usercache/travis/appcache/application_1428355034517_0004/container_1428355034517_0004_01_000001/flink-conf-modified.yaml
to file:/tmp/junit3904564006360292351/junit1676152559016123175/.flink/application_1428355034517_0004/flink-conf-modified.yaml
21:18:32,458 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Prepared
local resource for modified yaml: resource { scheme: "file" port: -1 file: "/tmp/junit3904564006360292351/junit1676152559016123175/.flink/application_1428355034517_0004/flink-conf-modified.yaml"
} size: 3393 timestamp: 1428355112000 type: FILE visibility: APPLICATION
21:18:32,461 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Create
container launch context.
21:18:32,483 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Starting
TM with command=$JAVA_HOME/bin/java -Xmx819m  -Dlog.file="<LOG_DIR>/taskmanager.log"
-Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.appMaster.YarnTaskManagerRunner
--configDir . 1> <LOG_DIR>/taskmanager-stdout.log 2> <LOG_DIR>/taskmanager-stderr.log
21:18:33,077 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - The user
requested 1 containers, 0 running. 1 containers missing
21:18:33,631 ERROR akka.actor.OneForOneStrategy                                  - Application
attempt appattempt_1428355034517_0004_000001 doesn't exist in ApplicationMasterService cache.
	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436)
	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
	at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt
appattempt_1428355034517_0004_000001 doesn't exist in ApplicationMasterService cache.
	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436)
	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
	at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
	at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
	at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy8.allocate(Unknown Source)
	at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:278)
	at org.apache.flink.yarn.ApplicationMasterActor$$anonfun$receiveYarnMessages$1.applyOrElse(ApplicationMasterActor.scala:190)
	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
	at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
	at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
	at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
	at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
	at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
	at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:91)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
	at akka.actor.ActorCell.invoke(ActorCell.scala:487)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
	at akka.dispatch.Mailbox.run(Mailbox.scala:221)
	at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException):
Application attempt appattempt_1428355034517_0004_000001 doesn't exist in ApplicationMasterService
cache.
	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436)
	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
	at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

	at org.apache.hadoop.ipc.Client.call(Client.java:1468)
	at org.apache.hadoop.ipc.Client.call(Client.java:1399)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
	at com.sun.proxy.$Proxy7.allocate(Unknown Source)
	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
	... 26 more
21:18:33,646 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - Stopping
JobManager akka://flink/user/jobmanager#395299512.
21:18:33,701 ERROR org.apache.flink.yarn.ApplicationMaster$                      - RECEIVED
SIGNAL 15: SIGTERM
21:19:52,986 INFO  org.apache.flink.yarn.YarnTestBase                            - Shutting
down MiniYarn cluster
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message