spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-22163) Design Issue of Spark Streaming that Causes Random Run-time Exception
Date Fri, 29 Sep 2017 00:50:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-22163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen resolved SPARK-22163.
-------------------------------
    Resolution: Duplicate

Please don't fork the issue. This is not a bug.

> Design Issue of Spark Streaming that Causes Random Run-time Exception
> ---------------------------------------------------------------------
>
>                 Key: SPARK-22163
>                 URL: https://issues.apache.org/jira/browse/SPARK-22163
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams, Structured Streaming
>    Affects Versions: 2.2.0
>         Environment: Spark Streaming
> Kafka
> Linux
>            Reporter: Michael N
>            Priority: Critical
>
> The application objects can contain List and can be modified dynamically as well.   However,
Spark Streaming framework asynchronously serializes the application's objects as the application
runs.  Therefore, it causes random run-time exception on the List when Spark Streaming framework
happens to serializes the application's objects while the application modifies a List in its
own object.  
> In fact, there are multiple bugs reported about
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList.writeObject
> that are permutation of the same root cause. So the design issue of Spark streaming framework
is that it should do this serialization asynchronously.  Instead, it should either
> 1. do this serialization synchronously. This is preferred to eliminate the issue completely.
 Or
> 2. Allow it to be configured per application whether to do this serialization synchronously
or asynchronously, depending on the nature of each application.
> Also, Spark documentation should describe the conditions that trigger Spark to do this
type of serialization asynchronously, so the applications can work around them until the fix
is provided. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message