spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nan Zhu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-18905) Potential Issue of Semantics of BatchCompleted
Date Fri, 16 Dec 2016 20:35:58 GMT

     [ https://issues.apache.org/jira/browse/SPARK-18905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nan Zhu updated SPARK-18905:
----------------------------
    Description: 
the current implementation of Spark streaming considers a batch is completed no matter the
results of the jobs (https://github.com/apache/spark/blob/1169db44bc1d51e68feb6ba2552520b2d660c2c0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala#L203)

Let's consider the following case:

A micro batch contains 2 jobs and they read from two different kafka topics respectively.
One of these jobs is failed due to some problem in the user defined logic. 

1. The main thread in the Spark streaming application will execute the line mentioned above,


2. and another thread (checkpoint writer) will make a checkpoint file immediately after this
line is executed. 

3. Then due to the current error handling mechanism in Spark Streaming, StreamingContext will
be closed (https://github.com/apache/spark/blob/1169db44bc1d51e68feb6ba2552520b2d660c2c0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala#L214)

the user recovers from the checkpoint file, and because the JobSet containing the failed job
has been removed (taken as completed) before the checkpoint is constructed, the data being
processed by the failed job would never be reprocessed?


I might have missed something in the checkpoint thread or this handleJobCompletion()....or
it is a potential bug 

  was:
the current implementation of Spark streaming considers a batch is completed no matter the
results of the jobs (https://github.com/apache/spark/blob/1169db44bc1d51e68feb6ba2552520b2d660c2c0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala#L203)

Let's consider the following case:

A micro batch contains 2 jobs and they read from two different kafka topics respectively.
One of this job is failed due to some problem in the user defined logic. 

1. The main thread in the Spark streaming application will execute the line mentioned above,


2. and another thread (checkpoint writer) will make a checkpoint file immediately after this
line is executed. 

3. Then due to the current error handling mechanism in Spark Streaming, StreamingContext will
be closed (https://github.com/apache/spark/blob/1169db44bc1d51e68feb6ba2552520b2d660c2c0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala#L214)

the user recovers from the checkpoint file, and because the JobSet containing the failed job
has been removed (taken as completed) before the checkpoint is constructed, the data being
processed by the failed job would never be reprocessed?


I might have missed something in the checkpoint thread or this handleJobCompletion()....or
it is a potential bug 


> Potential Issue of Semantics of BatchCompleted
> ----------------------------------------------
>
>                 Key: SPARK-18905
>                 URL: https://issues.apache.org/jira/browse/SPARK-18905
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2
>            Reporter: Nan Zhu
>
> the current implementation of Spark streaming considers a batch is completed no matter
the results of the jobs (https://github.com/apache/spark/blob/1169db44bc1d51e68feb6ba2552520b2d660c2c0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala#L203)
> Let's consider the following case:
> A micro batch contains 2 jobs and they read from two different kafka topics respectively.
One of these jobs is failed due to some problem in the user defined logic. 
> 1. The main thread in the Spark streaming application will execute the line mentioned
above, 
> 2. and another thread (checkpoint writer) will make a checkpoint file immediately after
this line is executed. 
> 3. Then due to the current error handling mechanism in Spark Streaming, StreamingContext
will be closed (https://github.com/apache/spark/blob/1169db44bc1d51e68feb6ba2552520b2d660c2c0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala#L214)
> the user recovers from the checkpoint file, and because the JobSet containing the failed
job has been removed (taken as completed) before the checkpoint is constructed, the data being
processed by the failed job would never be reprocessed?
> I might have missed something in the checkpoint thread or this handleJobCompletion()....or
it is a potential bug 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message