flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Richter <s.rich...@data-artisans.com>
Subject Re: Evolving serializers and impact on flink managed states
Date Wed, 09 Aug 2017 15:55:33 GMT

> 1. The article in question probably makes use of Flink serialization, what
> if I use Avro serde for the serialization and deserialization part. If I
> create a savepoint of my job, stop my flink, load the new POJO and continue
> from the savepoint, would avro's schema evolution feature perform the
> transition smoothly? 
> For example, a new entity is inserted, all the old values would get a
> default value for which there is no value available and when an entity is
> deleted, that value is simply dropped?

Serializers can provide their own schema evolution. In this case, when the schema changes,
the serializer would simply signal compatibility and deal with the schema versioning internally
and transparently for Flink. However, the serializer should be able to deal with all schema
versions at all time, because right now, e.g. for the RocksDB backend, it is impossible to
tell if and when a state is updated and rewritten in the new schema because an explicit conversion
step is currently still lacking (as described in the documentation). How such a serializer
deals with new and dropped entities is up to the implementation, Flink will simply accept
whatever the serializer delivers. So Avro schema evolution should work.

> 2. If yes, how would this play out in the flink ecosystem, and if not, would
> the flink serialization upgrades in the future handle such cases(forward and
> backward compatibility)?

As I see it, you can now use serializers that have their own schema evolution and Flink will
probably offer an additional, explicit way of schema evolution.

> 3. Are managed state also stored and reloaded, when savepoints are created
> and used for resuming a job?

This depends on the definition of „stored/reloaded“ and on the state backend. If stored/reloaded
means a roundtrip through serde, then the answer might be no for the RocksDB backend. This
backend always contains the state serialized as bytes and goes through serde per access/update.
The checkpoint/savepoint is based on the stored bytes. In contrast to that, heap based backends
will go through a serialization on checkpoint/savepoint and deserialization on recover/restore.

> 4. When can one expect to have the state migration feature in Flink? In
> 1.4.0? 

IIRC this is not part of the 1.4 roadmap. Flink 1.5 might be more realistic.

View raw message