mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neil Conway (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-4297) Executor does not shutdown when framework teardown.
Date Tue, 23 May 2017 18:15:04 GMT

     [ https://issues.apache.org/jira/browse/MESOS-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Neil Conway updated MESOS-4297:
-------------------------------
    Priority: Major  (was: Critical)

Seems hard to repro this issue without more information. Do you have the agent logs?

> Executor does not shutdown when framework teardown.
> ---------------------------------------------------
>
>                 Key: MESOS-4297
>                 URL: https://issues.apache.org/jira/browse/MESOS-4297
>             Project: Mesos
>          Issue Type: Bug
>          Components: framework
>    Affects Versions: 0.25.0
>         Environment: Marathon 0.11.0
> Mesos 0.25.0
> Spark 1.5.2
>            Reporter: Lei Xu
>
> We found a problem when teardown a Spark framework on Mesos, the executor could not exit
and still running.
> {code}
> root     48548 48539  2  2015 ?        04:28:11 /home/q/java/default/bin/java -cp /home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/3/runs/ca324f08-5be9-4457-a2a7-56f2605d6027/spark-1.5.2-bin-2.2.0/conf/:/home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/3/runs/ca324f08-5be9-4457-a2a7-56f2605d6027/spark-1.5.2-bin-2.2.0/lib/spark-assembly-1.5.2-hadoop2.2.0.jar
-Xms8192m -Xmx8192m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@10.90.27.71:47938/user/CoarseGrainedScheduler
--executor-id 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/3 --hostname l-qosslave26.ops.cn2.qunar.com
--cores 2 --app-id 20151228-163100-504125962-5050-31081-0016
> root     48644 48348  0  2015 ?        00:00:00 sh -c cd spark-1*;  ./bin/spark-class
org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@10.90.27.71:47938/user/CoarseGrainedScheduler
--executor-id 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/5 --hostname l-qosslave26.ops.cn2.qunar.com
--cores 2 --app-id 20151228-163100-504125962-5050-31081-0016
> root     48645 48644  2  2015 ?        04:28:45 /home/q/java/default/bin/java -cp /home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/5/runs/851073c4-d225-426b-b1b5-3d294eb76f8e/spark-1.5.2-bin-2.2.0/conf/:/home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/5/runs/851073c4-d225-426b-b1b5-3d294eb76f8e/spark-1.5.2-bin-2.2.0/lib/spark-assembly-1.5.2-hadoop2.2.0.jar
-Xms8192m -Xmx8192m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@10.90.27.71:47938/user/CoarseGrainedScheduler
--executor-id 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/5 --hostname l-qosslave26.ops.cn2.qunar.com
--cores 2 --app-id 20151228-163100-504125962-5050-31081-0016
> {code}
> This framework {{20151228-163100-504125962-5050-31081-0016}} has already teardown a few
days ago, And could not find in "Frameworks" page via webui. But in the slave page, I found
it still registered with slave node and run some executors.
> And I try to use REST API to kill the framework again, it returns {{No framework found
with specified ID}}.
> At last I killed the Spark task and mesos executor, there is no new task started by framework,
but it still on this slave and does not exit.
> {code}
> Frameworks
> ID 	User 	Name 	Active Tasks 	CPUs (Used / Allocated) 	Mem (Used / Allocated)
> …5050-31081-0016
> 	root 	wireless-m_invocation_kylin 	0 	/ 0.6 	/ 192 MB
> Executors
> ID 	Name 	Source 	Active Tasks 	Queued Tasks 	CPUs (Used / Allocated) 	Mem (Used / Allocated)
	
> 5 	Command Executor (Task: 5) (Command: sh -c 'cd spark-1*;...') 	5 	0 	0 	/ 0.1 	/ 32
MB 	Sandbox
> 4 	Command Executor (Task: 4) (Command: sh -c 'cd spark-1*;...') 	4 	0 	0 	/ 0.1 	/ 32
MB 	Sandbox
> 3 	Command Executor (Task: 3) (Command: sh -c 'cd spark-1*;...') 	3 	0 	0 	/ 0.1 	/ 32
MB 	Sandbox
> 2 	Command Executor (Task: 2) (Command: sh -c 'cd spark-1*;...') 	2 	0 	0 	/ 0.1 	/ 32
MB 	Sandbox
> 1 	Command Executor (Task: 1) (Command: sh -c 'cd spark-1*;...') 	1 	0 	0 	/ 0.1 	/ 32
MB 	Sandbox
> 0 	Command Executor (Task: 0) (Command: sh -c 'cd spark-1*;...') 	0 	0 	0 	/ 0.1 	/ 32
MB 	Sandbox 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message