spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-21145) Restarted queries reuse same StateStoreProvider, causing multiple concurrent tasks to update same StateStore
Date Thu, 22 Jun 2017 23:37:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060196#comment-16060196
] 

Apache Spark commented on SPARK-21145:
--------------------------------------

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/18396

> Restarted queries reuse same StateStoreProvider, causing multiple concurrent tasks to
update same StateStore
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21145
>                 URL: https://issues.apache.org/jira/browse/SPARK-21145
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.2.0
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>
> StateStoreProvider instances are loaded on-demand in a executor when a query is started.
When a query is restarted, the loaded provider instance will get reused. Now, there is a non-trivial
chance, that the task of the previous query run is still running, while the tasks of the restarted
run has started. So for a stateful partition, there may be two concurrent tasks related to
the same stateful partition, and there for using the same provider instance. This can lead
to inconsistent results and possibly random failures, as state store implementations are not
designed to be thread-safe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message