spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Rosen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-14658) when executor lost DagScheduer may submit one stage twice even if the first running taskset for this stage is not finished
Date Thu, 16 Feb 2017 22:19:42 GMT

     [ https://issues.apache.org/jira/browse/SPARK-14658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Rosen updated SPARK-14658:
-------------------------------
    Affects Version/s: 2.2.0
                       2.0.0
                       2.1.0

> when executor lost DagScheduer may submit one stage twice even if the first running taskset
for this stage is not finished
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-14658
>                 URL: https://issues.apache.org/jira/browse/SPARK-14658
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.6.1, 2.0.0, 2.1.0, 2.2.0
>         Environment: spark1.6.1  hadoop-2.6.0-cdh5.4.2
>            Reporter: yixiaohua
>
> 16/04/14 15:35:22 ERROR DAGSchedulerEventProcessLoop: DAGSchedulerEventProcessLoop failed;
shutting down SparkContext
> java.lang.IllegalStateException: more than one active taskSet for stage 57: 57.2,57.1
>         at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:173)
>         at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1052)
>         at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:921)
>         at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1214)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1637)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> First Time:
> 16/04/14 15:35:20 INFO DAGScheduler: Resubmitting ShuffleMapStage 57 (run at AccessController.java:-2)
because some of its tasks had failed: 5, 8, 9, 12, 13, 16, 17, 18, 19, 23, 26, 27, 28, 29,
30, 31, 40, 42, 43, 48, 49, 50, 51, 52, 53, 55, 56, 57, 59, 60, 61, 67, 70, 71, 84, 85, 86,
87, 98, 99, 100, 101, 108, 109, 110, 111, 112, 113, 114, 115, 126, 127, 134, 136, 137, 146,
147, 150, 151, 154, 155, 158, 159, 162, 163, 164, 165, 166, 167, 170, 171, 172, 173, 174,
175, 176, 177, 178, 179, 180, 181, 188, 189, 190, 191, 198, 199, 204, 206, 207, 208, 218,
219, 222, 223, 230, 231, 236, 238, 239
> 16/04/14 15:35:20 DEBUG DAGScheduler: submitStage(ShuffleMapStage 57)
> 16/04/14 15:35:20 DEBUG DAGScheduler: missing: List()
> 16/04/14 15:35:20 INFO DAGScheduler: Submitting ShuffleMapStage 57 (MapPartitionsRDD[7887]
at run at AccessController.java:-2), which has no missing parents
> 16/04/14 15:35:20 DEBUG DAGScheduler: submitMissingTasks(ShuffleMapStage 57)
> 16/04/14 15:35:20 INFO DAGScheduler: Submitting 100 missing tasks from ShuffleMapStage
57 (MapPartitionsRDD[7887] at run at AccessController.java:-2)
> 16/04/14 15:35:20 DEBUG DAGScheduler: New pending partitions: Set(206, 177, 127, 98,
48, 27, 23, 163, 238, 188, 159, 28, 109, 59, 9, 176, 126, 207, 174, 43, 170, 208, 158, 108,
29, 8, 204, 154, 223, 173, 219, 190, 111, 61, 40, 136, 115, 86, 57, 155, 55, 230, 222, 180,
172, 151, 101, 18, 166, 56, 137, 87, 52, 171, 71, 42, 167, 198, 67, 17, 236, 165, 13, 5, 53,
178, 99, 70, 49, 218, 147, 164, 114, 85, 60, 31, 179, 150, 19, 100, 50, 175, 146, 134, 113,
84, 51, 30, 199, 26, 16, 191, 162, 112, 12, 239, 231, 189, 181, 110)
> Second Time:
> 16/04/14 15:35:22 INFO DAGScheduler: Resubmitting ShuffleMapStage 57 (run at AccessController.java:-2)
because some of its tasks had failed: 26
> 16/04/14 15:35:22 DEBUG DAGScheduler: submitStage(ShuffleMapStage 57)
> 16/04/14 15:35:22 DEBUG DAGScheduler: missing: List()
> 16/04/14 15:35:22 INFO DAGScheduler: Submitting ShuffleMapStage 57 (MapPartitionsRDD[7887]
at run at AccessController.java:-2), which has no missing parents
> 16/04/14 15:35:22 DEBUG DAGScheduler: submitMissingTasks(ShuffleMapStage 57)
> 16/04/14 15:35:22 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage
57 (MapPartitionsRDD[7887] at run at AccessController.java:-2)
> 16/04/14 15:35:22 DEBUG DAGScheduler: New pending partitions: Set(26)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message