spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tathagata Das (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-22187) Update unsaferow format for saved state such that we can set timeouts when state is null
Date Thu, 07 Dec 2017 22:34:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-22187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282628#comment-16282628
] 

Tathagata Das edited comment on SPARK-22187 at 12/7/17 10:33 PM:
-----------------------------------------------------------------

I am reverting this because this will break existing streaming pipelines already using mapGroupswithState.
This will be re-applied in the future after we start saving more metadata in checkpoints to
signify which version of state row format the existing streaming query is running. Then we
can decode old and new formats accordingly.


was (Author: tdas):
I am reverting this because this will break existing streaming pipelines already using mapGroupswithState

> Update unsaferow format for saved state such that we can set timeouts when state is null
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-22187
>                 URL: https://issues.apache.org/jira/browse/SPARK-22187
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Structured Streaming
>    Affects Versions: 2.2.0
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>              Labels: release-notes, releasenotes
>
> Currently the group state of user-defined-type is encoded as top-level columns in the
unsaferows stores in state store. The timeout timestamp is also saved as (when needed) as
the last top-level column. Since, the groupState is serialized to top level columns, you cannot
save "null" as a value of state (setting null in all the top-level columns is not equivalent).
So we dont let the user to set the timeout without initializing the state for a key. Based
on user experience, his leads to confusion. 
> This JIRA is to change the row format such that the state is saved as nested columns.
This would allow the state to be set to null, and avoid these confusing corner cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message