flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tzu-Li (Gordon) Tai" <tzuli...@apache.org>
Subject Re: Clarification of TypesSerializers
Date Fri, 07 Jul 2017 13:23:26 GMT
Hi Dawid,

First of all, one thing to clarify: TypeSerializer#ensureCompatibility is invoked on the new
provided state serializer.
Also, a reconfigured compatible serializer should NOT have different serialization formats
(that would require state migration, i.e. return CompatibilityResult#requiresStateMigration).
A reconfigured serializer will be continued to be used by Flink as if nothing has changed,
can read old AND new data, and still writes in the same format.
In your case, I think you may be interpreting “a reconfigured serializer” incorrectly.

Unfortunately it does not work for HeapStateBackend as during restoring the StateBackend the
method TypeSerializer#ensureCompatibility is not invoked and the state value is eagerly deserialized
with the not reconfigured serializer. 
For the HeapStateBackend, as of now, this is expected. The main reason for this is that currently,
new state serializers are provided lazily (i.e. when the state descriptor is registered).
There is no new serializer available to be confronted / reconfigured with the previous TypeSerializerConfigSnapshot
at restore time.

Therefore, for HeapStateBackend, we are using the old serialized serializer (not invoked with
#ensureCompatibility) to read everything to state objects. This should always work as long
as the old serializer can be deserialized properly.

Cheers,
Gordon

On 7 July 2017 at 5:57:56 PM, Dawid Wysakowicz (wysakowicz.dawid@gmail.com) wrote:

Hi devs,  

Currently I am working on some changes to serializer for NFA class in CEP library. I am trying
to understand how the TypeSerializer#ensureCompatibility feature works.  

What I want to do is in a previous version (e.g. in 1.3.0) some information was serialized
that now shouldn't. In TypeSerializer#ensureCompatibility I am setting a flag based on corresponding
ConfigSnapshot version that tells me if that additional info should be read.  

So let’s get to the point :). Unfortunately it does not work for HeapStateBackend as during
restoring the StateBackend the method TypeSerializer#ensureCompatibility is not invoked and
the state value is eagerly deserialized with the not reconfigured serializer.  
It does work though for RocksDBStateBackend, as while restoring there is no deserialisation
of the value(lazy deserialization). It is first deserialized when accessing (getColumnFamily
etc. I suppose) and then the method ensuringCompatibility is called and the serializer is
properly reconfigured.  

My questions are:  
- is my serialization plan ok, with setting the flag  
- are the different behaviours intended or is it a bug for HeapStateBackend  

If it is a bug, I would be willing to fix it(or at least try), but probably I will need some
guidance.  

Regards  
Dawid  


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message