spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruve Ashar (JIRA)" <>
Subject [jira] [Commented] (SPARK-15703) Spark UI doesn't show all tasks as completed when it should
Date Tue, 19 Jul 2016 15:18:20 GMT


Dhruve Ashar commented on SPARK-15703:

Here are some of the findings: 

LiveListenerBus replaces the AsynchronousListenerBus. With dynamic allocation enabled and
setting maximum executors to ~2000, I am consistently seeing excessive messages being dropped
for an input data size of 300GB. These events are being dropped (UI gets messed up here) because
the event queue is not being drained fast enough. 

>From the thread dumps, the event queue dispatcher freezes up momentarily during which
the queue gets full in a short span and messages are dropped, and once its active, the queue
clears up fast. The race condition happens in ExecutorAllocationManager because of the synchronization.
And the dispatcher threads waits for the locks to be released. See attached dumps.

The remedy for this is two fold:
1 - Decouple the event dispatch and handling of dynamic executor allocation. 
2 - Make the listener event queue size configurable. For users who want to run with smaller
heartbeat intervals, the no. of events floating around would be large and it would be helpful
to have the flexibility to tune this.

> Spark UI doesn't show all tasks as completed when it should
> -----------------------------------------------------------
>                 Key: SPARK-15703
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Web UI
>    Affects Versions: 2.0.0
>            Reporter: Thomas Graves
>            Priority: Critical
>         Attachments: Screen Shot 2016-06-01 at 11.21.32 AM.png, Screen Shot 2016-06-01
at 11.23.48 AM.png, SparkListenerBus .png, spark-dynamic-executor-allocation.png
> The Spark UI doesn't seem to be showing all the tasks and metrics.
> I ran a job with 100000 tasks but Detail stage page says it completed 93029:
> Summary Metrics for 93029 Completed Tasks
> The Stages for all jobs pages list that only 89519/100000 tasks finished but its completed.
 The metrics for shuffled write and input are also incorrect.
> I will attach screen shots.
> I checked the logs and it does show that all the tasks actually finished.
> 16/06/01 16:15:42 INFO TaskSetManager: Finished task 59880.0 in stage 2.0 (TID 54038)
in 265309 ms on (100000/100000)
> 16/06/01 16:15:42 INFO YarnClusterScheduler: Removed TaskSet 2.0, whose tasks have all
completed, from pool

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message