spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <>
Subject [jira] [Resolved] (SPARK-23397) Scheduling delay causes Spark Streaming to miss batches.
Date Tue, 13 Feb 2018 10:49:00 GMT


Sean Owen resolved SPARK-23397.
    Resolution: Not A Problem

It's the part in "foreachRDD" that gets executed at each batch; one answer is to make sure
you don't re-execute logic at each batch that you don't need to. 

But you're just in general saying that sometimes complex operations take a long time, or,
that you'd prefer a certain operation were faster. Neither relates to the original issue here,
about scheduling.

> Scheduling delay causes Spark Streaming to miss batches.
> --------------------------------------------------------
>                 Key: SPARK-23397
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.2.1
>            Reporter: Shahbaz Hussain
>            Priority: Major
> * For Complex Spark (Scala) based D-Stream based applications ,which requires creating
Ex: 40 Jobs for every batch ,its been observed that ,batches does not get created on the specific
time ,ex: if i started a Spark Streaming based application with batch interval as 20 seconds
and application is creating 40 odd Jobs ,observe the next batch does not create 20 seconds
later than previous job creation time.
>  * This is due to the fact that Job Creation is Single Threaded, if Job Creation delay
is greater than Batch Interval time ,batch execution misses its schedule.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message