flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaogang Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6178) Allow upgrades to state serializers
Date Tue, 28 Mar 2017 08:15:41 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944743#comment-15944743

Xiaogang Shi commented on FLINK-6178:

The idea that allowing the upgrades to state serializers is excellent. But I have some concerns
about the "convertState" phase. Currently, Flink has no knowledge of the serializers to use
before users access the states (via the methods provided in {{RuntimeContext}}). That means,
we can only convert the states when users are about to access them. The conversion may be
very costly and the processing of data streams will be paused for quite a long time. 

Actually, i am very interested at the offline tool provided in the future. Now many efforts
are made in Flink runtime to allow the restoring from old savepoints. They make the code very
complicated and hard to follow. I prefer to move them from the main program to the offline

I think the offline tool also eases the burdens of users to implement {{TypeSerializer}}s
that allow the deserialization of the data in different serialization formats. They only need
to provide the new serializers to access the states stored in the savepoints.

> Allow upgrades to state serializers
> -----------------------------------
>                 Key: FLINK-6178
>                 URL: https://issues.apache.org/jira/browse/FLINK-6178
>             Project: Flink
>          Issue Type: New Feature
>          Components: State Backends, Checkpointing, Type Serialization System
>            Reporter: Tzu-Li (Gordon) Tai
>            Assignee: Tzu-Li (Gordon) Tai
> Currently, users are locked in with the serializer implementation used to write their
> This is suboptimal, as generally for users, it could easily be possible that they wish
to change their serialization formats / state schemas and types in the future.
> This is an umbrella JIRA for the required tasks to make this possible.
> Here's an overview description of what to expect for the overall outcome of this JIRA
(the specific details are outlined in their respective subtasks):
> Ideally, the main user-facing change this would result in is that users implementing
their custom {{TypeSerializer}} s will also need to implement hook methods that identify whether
or not there is a change to the serialized format or even a change to the serialized data
type. It would be the user's responsibility that the {{deserialize}} method can bridge the
change between the old / new formats.
> For Flink's built-in serializers that are automatically built using the user's configuration
(most notably the more complex {{KryoSerializer}} and {{GenericArraySerializer}}), Flink should
be able to automatically "reconfigure" them using the new configuration, so that the reconfigured
versions can be used to de- / serialize previous state. This would require knowledge of the
previous configuration of the serializer, therefore "serializer configuration metadata" will
be added to savepoints.
> Note that for the first version of this, although additional infrastructure (e.g. serializer
reconfigure hooks, serializer configuration metadata in savepoints) will be added to potentially
allow Kryo version upgrade, this JIRA will not cover this. Kryo has breaking binary formats
across major versions, and will most likely need some further changes. Therefore, for the
{{KryoSerializer}}, "upgrading" it simply means changes in the registration of specific /
default serializers, at least for now.
> Finally, we would need to add a "convertState" phase to the task lifecycle, that takes
place after the "open" phase and before checkpointing starts / the task starts running. It
can only happen after "open", because only then can we be certain if any reconfiguration of
state serialization has occurred, and state needs to be converted. Ideally, the code for the
"convertState" is designed so that it can be easily exposed as an offline tool in the future.

This message was sent by Atlassian JIRA

View raw message