flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ani.desh1512" <ani.desh1...@gmail.com>
Subject Flink with Mesos: Fetcher error
Date Thu, 08 Jun 2017 21:21:34 GMT
I am trying to configure Flink to work on top of Mesos. I am using Flink
release-1.3. I am using DCOS 1.9's underlying mesos which is version 1.2. I
am able to start Flink without any issues when the taskmanager starts on the
same host as that of appmaster. But when the taskmanager is launched on a
different host, the container fails to launch. The flink mesos-appmaster log
is something as follows:

/2017-06-08 19:19:01,537 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Launching Mesos task taskmanager-00003 on host 10.101.2.117.
2017-06-08 19:19:01,550 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Launching Mesos task taskmanager-00002 on host 10.101.2.117.
2017-06-08 19:19:01,607 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Launching Mesos task taskmanager-00001 on host 10.101.2.117.
2017-06-08 19:19:01,623 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Launching Mesos task taskmanager-00004 on host 10.101.2.117.
2017-06-08 19:19:01,645 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Launching Mesos task taskmanager-00006 on host 10.101.2.91.
2017-06-08 19:19:01,660 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Launching Mesos task taskmanager-00005 on host 10.101.2.91.
2017-06-08 19:19:01,674 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Launching Mesos task taskmanager-00007 on host 10.101.2.91.
2017-06-08 19:19:02,234 WARN  org.apache.flink.mesos.scheduler.TaskMonitor               
 
- Mesos task taskmanager-00003 failed unexpectedly.
2017-06-08 19:19:02,234 WARN  org.apache.flink.mesos.scheduler.TaskMonitor               
 
- Mesos task taskmanager-00002 failed unexpectedly.
2017-06-08 19:19:02,245 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Mesos task taskmanager-00002 failed, with a TaskManager in launch or
registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED
(Failed to launch container: Failed to fetch all URIs for container
'125055b6-9a19-4d62-a019-5d8a4197c043' with exit status: 256)
2017-06-08 19:19:02,246 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Diagnostics for task taskmanager-00002 in state TASK_FAILED :
reason=REASON_CONTAINER_LAUNCH_FAILED message=Failed to launch container:
Failed to fetch all URIs for container
'125055b6-9a19-4d62-a019-5d8a4197c043' with exit status: 256
2017-06-08 19:19:02,247 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Total number of failed tasks so far: 1
2017-06-08 19:19:02,252 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Mesos task taskmanager-00003 failed, with a TaskManager in launch or
registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED
(Failed to launch container: Failed to fetch all URIs for container
'69259a92-b3e4-44c7-9afd-3ac650524570' with exit status: 256)
2017-06-08 19:19:02,252 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Diagnostics for task taskmanager-00003 in state TASK_FAILED :
reason=REASON_CONTAINER_LAUNCH_FAILED message=Failed to launch container:
Failed to fetch all URIs for container
'69259a92-b3e4-44c7-9afd-3ac650524570' with exit status: 256
2017-06-08 19:19:02,252 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Total number of failed tasks so far: 2
2017-06-08 19:19:02,313 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Scheduling Mesos task taskmanager-00008 with (2048.0 MB, 1.0 cpus).
2017-06-08 19:19:02,330 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Scheduling Mesos task taskmanager-00009 with (2048.0 MB, 1.0 cpus).
2017-06-08 19:19:02,331 INFO 
org.apache.flink.mesos.scheduler.LaunchCoordinator            - Now
gathering offers for at least 2 task(s).
2017-06-08 19:19:02,332 WARN  org.apache.flink.mesos.scheduler.TaskMonitor               
 
- Mesos task taskmanager-00004 failed unexpectedly.
2017-06-08 19:19:02,332 WARN  org.apache.flink.mesos.scheduler.TaskMonitor               
 
- Mesos task taskmanager-00001 failed unexpectedly.
2017-06-08 19:19:02,412 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Mesos task taskmanager-00004 failed, with a TaskManager in launch or
registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED
(Failed to launch container: Failed to fetch all URIs for container
'a65c3e35-579d-4302-830f-be50b6d0ca06' with exit status: 256)
2017-06-08 19:19:02,412 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Diagnostics for task taskmanager-00004 in state TASK_FAILED :
reason=REASON_CONTAINER_LAUNCH_FAILED message=Failed to launch container:
Failed to fetch all URIs for container
'a65c3e35-579d-4302-830f-be50b6d0ca06' with exit status: 256
2017-06-08 19:19:02,412 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Total number of failed tasks so far: 3
2017-06-08 19:19:02,432 INFO 
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Mesos task taskmanager-00001 failed, with a TaskManager in launch or
registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED
(Failed to launch container: Failed to fetch all URIs for container
'325e14fe-8840-4996-96dc-5c7ffc159d12' with exit status: 256)/

I checked the stderr in Mesos sandbox and it is as follows:

/I0608 19:20:06.184386 30480 fetcher.cpp:531] Fetcher Info:
{"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/6b7667c0-1b1a-43a4-ba1f-27cb0660608f-S6\/flink","items":[{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/mesos-taskmanager.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/mesos-taskmanager.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/yarn-session.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/yarn-session.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/log4j-console.properties","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/log4j-console.properties"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/log4j.properties","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/log4j.properties"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/lib\/log4j-1.2.17.jar","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/log4j-1.2.17.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/mesos-appmaster.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/mesos-appmaster.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/stop-zookeeper-quorum.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/stop-zookeeper-quorum.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/stop-local.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/stop-local.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/taskmanager.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/taskmanager.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/start-local.bat","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-local.bat"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/start-cluster.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-cluster.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/stop-cluster.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/stop-cluster.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/start-scala-shell.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-scala-shell.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/flink","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/pyflink.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/pyflink.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/log4j-yarn-session.properties","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/log4j-yarn-session.properties"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/logback-yarn.xml","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/logback-yarn.xml"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/flink-daemon.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink-daemon.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/zookeeper.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/zookeeper.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/logback-console.xml","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/logback-console.xml"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/masters","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/masters"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/conf\/flink-conf.yaml","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/flink-conf.yaml"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/zoo.cfg","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/zoo.cfg"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/lib\/flink-shaded-hadoop2-uber-1.3-SNAPSHOT.jar","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/flink-shaded-hadoop2-uber-1.3-SNAPSHOT.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/slaves","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/slaves"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/lib\/flink-dist_2.10-1.3-SNAPSHOT.jar","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/flink-dist_2.10-1.3-SNAPSHOT.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/lib\/slf4j-log4j12-1.7.7.jar","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/slf4j-log4j12-1.7.7.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/log4j-cli.properties","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/log4j-cli.properties"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/historyserver.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/historyserver.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/lib\/flink-python_2.10-1.3-SNAPSHOT.jar","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/flink-python_2.10-1.3-SNAPSHOT.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/logback.xml","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/logback.xml"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/pyflink.bat","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/pyflink.bat"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/start-local.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-local.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/flink.bat","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink.bat"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/start-zookeeper-quorum.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-zookeeper-quorum.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/jobmanager.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/jobmanager.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/flink-console.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink-console.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/config.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/config.sh"}}],"sandbox_directory":"\/var\/lib\/mesos\/slave\/slaves\/6b7667c0-1b1a-43a4-ba1f-27cb0660608f-S6\/frameworks\/6b7667c0-1b1a-43a4-ba1f-27cb0660608f-0030\/executors\/taskmanager-00009\/runs\/d8d1756d-f977-43f6-a53f-55c19b6c6294","user":"flink"}
I0608 19:20:06.189909 30480 fetcher.cpp:442] Fetching URI
'http://localhost:38985/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78/flink/bin/mesos-taskmanager.sh'
I0608 19:20:06.189932 30480 fetcher.cpp:283] Fetching directly into the
sandbox directory
I0608 19:20:06.190213 30480 fetcher.cpp:220] Fetching URI
'http://localhost:38985/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78/flink/bin/mesos-taskmanager.sh'
I0608 19:20:06.190251 30480 fetcher.cpp:163] Downloading resource from
'http://localhost:38985/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78/flink/bin/mesos-taskmanager.sh'
to
'/var/lib/mesos/slave/slaves/6b7667c0-1b1a-43a4-ba1f-27cb0660608f-S6/frameworks/6b7667c0-1b1a-43a4-ba1f-27cb0660608f-0030/executors/taskmanager-00009/runs/d8d1756d-f977-43f6-a53f-55c19b6c6294/flink/bin/mesos-taskmanager.sh'
Failed to fetch
'http://localhost:38985/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78/flink/bin/mesos-taskmanager.sh':
Error downloading resource: Couldn't connect to server
Failed to synchronize with agent (it's probably exited)/

So, my question is what am I missing?
Will I need to mention some special URI in marathon for flink? I am setting
mesos.master as /zk://leader.mesos:2181/mesos/. Is this the one that is
creating problem?
Or, have I missed some mesos or marathon setting?
Also, I am launching this via Marathon and I have the same flink dist at
same path in all the slaves

Thanks,



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-with-Mesos-Fetcher-error-tp13603.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Mime
View raw message