spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mathieu D (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-18881) Spark never finishes jobs and stages, JobProgressListener fails
Date Tue, 30 May 2017 19:53:04 GMT

    [ https://issues.apache.org/jira/browse/SPARK-18881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030025#comment-16030025
] 

Mathieu D edited comment on SPARK-18881 at 5/30/17 7:52 PM:
------------------------------------------------------------

Just to mention a workaround for those experiencing the problem : try increase {{spark.scheduler.listenerbus.eventqueue.size}}
(default 10000). 
It may only postpone the problem, if the queue filling is faster than listeners for a long
time. In our case, we have bursts of activity and raising this limit helps.


was (Author: mathieude):
Just to mention a workaround for those experiencing the problem : try increase {{spark.scheduler.listenerbus.eventqueue.size}}
(default 10000). 
It may only postpone the problem, if the queue filling is faster than listeners for a long
time. In our case, we have bursts of activity and raising this limits helps.

> Spark never finishes jobs and stages, JobProgressListener fails
> ---------------------------------------------------------------
>
>                 Key: SPARK-18881
>                 URL: https://issues.apache.org/jira/browse/SPARK-18881
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 2.0.2
>         Environment: yarn, deploy-mode = client
>            Reporter: Mathieu D
>
> We have a Spark application that process continuously a lot of incoming jobs. Several
jobs are processed in parallel, on multiple threads.
> During intensive workloads, at some point, we start to have hundreds of  warnings like
this :
> {code}
> 16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379
> 16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job 64610
> 16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage 147405
> 16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406
> 16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job 64622
> {code}
> Starting from that, the performance of the app plummet, most of Stages and Jobs never
finish. On SparkUI, I can see figures like 13000 pending jobs.
> I can't see clearly another related exception happening before. Maybe this one, but it
concerns another listener :
> {code}
> 16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining
room in event queue. This likely means one of the SparkListeners is too slow and cannot keep
up with the rate at which tasks are being started by the scheduler.
> 16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since Thu Jan 01
01:00:00 CET 1970
> {code}
> This is very problematic for us, since it's hard to detect, and requires an app restart.
> *EDIT :*
> I confirm the sequence :
> 1- ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event
queue
> then
> 2- JobProgressListener losing track of job and stages.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message