spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stavros Kontopoulos (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-18935) Use Mesos "Dynamic Reservation" resource for Spark
Date Thu, 28 Sep 2017 18:17:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-18935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184588#comment-16184588
] 

Stavros Kontopoulos edited comment on SPARK-18935 at 9/28/17 6:16 PM:
----------------------------------------------------------------------

I verified the example and error is the same yet the reason is as in the cluster mode case:

{noformat}

17/09/28 21:07:34 INFO MesosCoarseGrainedSchedulerBackend: Mesos task 1 is now TASK_ERROR
17/09/28 21:07:34 INFO MesosCoarseGrainedSchedulerBackend: Blacklisting Mesos slave 433038b9-80aa-43ef-b6eb-0075f5028d37-S0
due to too many failures; is Spark installed on it?
17/09/28 21:07:34 DEBUG CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove executor
1 with reason Executor finished with state LOST
17/09/28 21:07:34 INFO BlockManagerMaster: Removal of executor 1 requested
17/09/28 21:07:34 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent
executor 1
17/09/28 21:07:34 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.

{noformat}

The task is failing and the agent is blacklisted. The task is failing due to:

{noformat}
I0928 21:07:34.621839  5559 master.cpp:6532] Sending status update TASK_ERROR for task 0 of
framework e46985fe-1392-4d39-a3d5-e7ec77810695-0004 'Total resources cpus(spark-prive)(allocated:
spark-prive):8; mem(spark-prive)(allocated: spark-prive):1408 required by task and its executor
is more than available ports(spark-prive, )(allocated: spark-prive):[31000-32000]; disk(spark-prive,
)(allocated: spark-prive):1000; cpus(spark-prive, )(allocated: spark-prive):8; mem(spark-prive,
)(allocated: spark-prive):10024; mem(*)(allocated: spark-prive):4590; disk(*)(allocated: spark-prive):103216'
I0928 21:07:34.622593  5559 hierarchical.cpp:850] Updated allocation of framework e46985fe-1392-4d39-a3d5-e7ec77810695-0004
on agent 433038b9-80aa-43ef-b6eb-0075f5028d37-S0 from ports(spark-prive, )(allocated: spark-prive):[31000-32000];
disk(spark-prive, )(allocated: spark-prive):1000; cpus(spark-prive, )(allocated: spark-prive):8;
mem(spark-prive, )(allocated: spark-prive):10024; mem(*)(allocated: spark-prive):4590; disk(*)(allocated:
spark-prive):103216 to ports(spark-prive, )(allocated: spark-prive):[31000-32000]; disk(spark-prive,
)(allocated: spark-prive):1000; cpus(spark-prive, )(allocated: spark-prive):8; mem(spark-prive,
)(allocated: spark-prive):10024; mem(*)(allocated: spark-prive):4590; disk(*)(allocated: spark-prive):103216
I0928 21:07:34.647950  5559 master.cpp:4941] Processing REVIVE call for framework e46985fe-1392-4d39-a3d5-e7ec77810695-0004
(Spark Pi) at scheduler-df433215-b87c-4b9b-993c-a3253c5f11a8@127.0.1.1:34775

{noformat}

So again its the same reason as I have seen before.



was (Author: skonto):
I verified the example and error is the same yet the reason is different:

{noformat}

17/09/28 21:07:34 INFO MesosCoarseGrainedSchedulerBackend: Mesos task 1 is now TASK_ERROR
17/09/28 21:07:34 INFO MesosCoarseGrainedSchedulerBackend: Blacklisting Mesos slave 433038b9-80aa-43ef-b6eb-0075f5028d37-S0
due to too many failures; is Spark installed on it?
17/09/28 21:07:34 DEBUG CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove executor
1 with reason Executor finished with state LOST
17/09/28 21:07:34 INFO BlockManagerMaster: Removal of executor 1 requested
17/09/28 21:07:34 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent
executor 1
17/09/28 21:07:34 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.

{noformat}

The task is failing and the agent is blacklisted. The task is failing due to:

{noformat}
I0928 21:07:34.621839  5559 master.cpp:6532] Sending status update TASK_ERROR for task 0 of
framework e46985fe-1392-4d39-a3d5-e7ec77810695-0004 'Total resources cpus(spark-prive)(allocated:
spark-prive):8; mem(spark-prive)(allocated: spark-prive):1408 required by task and its executor
is more than available ports(spark-prive, )(allocated: spark-prive):[31000-32000]; disk(spark-prive,
)(allocated: spark-prive):1000; cpus(spark-prive, )(allocated: spark-prive):8; mem(spark-prive,
)(allocated: spark-prive):10024; mem(*)(allocated: spark-prive):4590; disk(*)(allocated: spark-prive):103216'
I0928 21:07:34.622593  5559 hierarchical.cpp:850] Updated allocation of framework e46985fe-1392-4d39-a3d5-e7ec77810695-0004
on agent 433038b9-80aa-43ef-b6eb-0075f5028d37-S0 from ports(spark-prive, )(allocated: spark-prive):[31000-32000];
disk(spark-prive, )(allocated: spark-prive):1000; cpus(spark-prive, )(allocated: spark-prive):8;
mem(spark-prive, )(allocated: spark-prive):10024; mem(*)(allocated: spark-prive):4590; disk(*)(allocated:
spark-prive):103216 to ports(spark-prive, )(allocated: spark-prive):[31000-32000]; disk(spark-prive,
)(allocated: spark-prive):1000; cpus(spark-prive, )(allocated: spark-prive):8; mem(spark-prive,
)(allocated: spark-prive):10024; mem(*)(allocated: spark-prive):4590; disk(*)(allocated: spark-prive):103216
I0928 21:07:34.647950  5559 master.cpp:4941] Processing REVIVE call for framework e46985fe-1392-4d39-a3d5-e7ec77810695-0004
(Spark Pi) at scheduler-df433215-b87c-4b9b-993c-a3253c5f11a8@127.0.1.1:34775

{noformat}

So again its the same reason as I have seen before.


> Use Mesos "Dynamic Reservation" resource for Spark
> --------------------------------------------------
>
>                 Key: SPARK-18935
>                 URL: https://issues.apache.org/jira/browse/SPARK-18935
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2
>            Reporter: jackyoh
>
> I'm running spark on Apache Mesos
> Please follow these steps to reproduce the issue:
> 1. First, run Mesos resource reserve:
> curl -i -d slaveId=c24d1cfb-79f3-4b07-9f8b-c7b19543a333-S0 -d resources='[{"name":"cpus","type":"SCALAR","scalar":{"value":20},"role":"spark","reservation":{"principal":""}},{"name":"mem","type":"SCALAR","scalar":{"value":4096},"role":"spark","reservation":{"principal":""}}]'
-X POST http://192.168.1.118:5050/master/reserve
> 2. Then run spark-submit command:
> ./spark-submit --class org.apache.spark.examples.SparkPi --master mesos://192.168.1.118:5050
--conf spark.mesos.role=spark  ../examples/jars/spark-examples_2.11-2.0.2.jar 10000
> And the console will keep loging same warning message as shown below: 
> 16/12/19 22:33:28 WARN TaskSchedulerImpl: Initial job has not accepted any resources;
check your cluster UI to ensure that workers are registered and have sufficient resources



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message