flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Padarn Wilson <padarn.wil...@grab.com>
Subject Re: Clarification around versioning and flink's state serialization
Date Sat, 22 Dec 2018 07:53:13 GMT
Thanks for the clarification, that is clear now.

Look forward to seeing your slides, safe travels.

On Sat, Dec 22, 2018 at 8:25 AM Tzu-Li (Gordon) Tai <tzulitai@apache.org>
wrote:

> 1. Correct. Under the hood, evolvability of schema relies on the type's
> serializer implementation to support it.
> In Flink 1.7, this had been done only for Avro's Flink built-in serializer
> (i.e. the AvroSerializer class) for now, so you don't need to provide a
> custom serializer for this.
> For any other types, that would be required for now; again, how to
> implement a custom serializer that works for schema evolution is covered in
> the documents.
>
> 2. Yes, disabling generic types will let the job fail if any data type is
> determined to be serialized by Kryo, let it be for on-wire data
> transmission or for state serialization.
>
> I'm currently still traveling because of the recent Flink Forward event;
> will send you a copy of the latest slides I presented about the topic once
> I get back.
>
> Cheers,
> Gordon
>
> On Fri, Dec 21, 2018, 10:42 PM Padarn Wilson <padarn.wilson@grab.com
> wrote:
>
>> Yes that helps a lot!
>>
>> Just to clarify:
>> - If using Avro types in 1.7, no explicit declaration of serializers
>> needs to be done to have state evolution. But all other evolvable types
>> (e.g Protobuf) still need to be registered and evolved manually?
>> - If specifying `disableGenericTypes` on my execution context, anything
>> that falls back to Kryo will cause an error.
>>
>> Would love to see more updated slides if you don't mind.
>>
>> Thanks for taking the time,
>> Padarn
>>
>>
>> On Fri, Dec 21, 2018 at 10:04 PM Tzu-Li (Gordon) Tai <tzulitai@apache.org>
>> wrote:
>>
>>> For the documents I would recommend reading through:
>>>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/schema_evolution.html
>>>
>>>
>>> On Fri, Dec 21, 2018, 9:55 PM Tzu-Li (Gordon) Tai <tzulitai@apache.org
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Yes, if Flink does not recognize your registered state type, it will by
>>>> default use Kryo for the serialization.
>>>> And generally speaking, Kryo does not have good support for evolvable
>>>> schemas compared to other serialization frameworks such as Avro or Protobuf.
>>>>
>>>> The reason why Flink defaults to Kryo for unrecognizable types has some
>>>> historical reasons due to the original use of Flink's type serialization
>>>> stack being used on the batch side, but IMO the short answer is that it
>>>> would make sense to have a different default serializer (perhaps Avro) for
>>>> snapshotting state in streaming programs.
>>>> However, I believe this would be better suited as a separate discussion
>>>> thread.
>>>>
>>>> The good news is that with Flink 1.7, state schema evolution is fully
>>>> supported out of the box for Avro types, such as GenericRecord or code
>>>> generated SpecificRecords.
>>>> If you want to have evolvable schema for your state types, then it is
>>>> recommended to use Avro as state types.
>>>> Support for evolving schema of other data types such as POJOs and Scala
>>>> case classes is also on the radar for future releases.
>>>>
>>>> Does this help answer your question?
>>>>
>>>> By the way, the slides your are looking at I would consider quite
>>>> outdated for the topic, since Flink 1.7 was released with much smoother
>>>> support for state schema evolution.
>>>> An updated version of the slides is not yet publicly available, but if
>>>> you want I can send you one privately.
>>>> Otherwise, the Flink docs for 1.7 would also be equally helpful.
>>>>
>>>> Cheers,
>>>> Gordon
>>>>
>>>>
>>>> On Fri, Dec 21, 2018, 8:11 PM Padarn Wilson <padarn.wilson@grab.com
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am trying to understand the situation with state serialization in
>>>>> flink. I'm looking at a number of sources, but slide 35 from here
>>>>> crystalizes my confusion:
>>>>>
>>>>>
>>>>> https://www.slideshare.net/FlinkForward/flink-forward-berlin-2017-tzuli-gordon-tai-managing-state-in-apache-flink
>>>>>
>>>>> So, I understand that if 'Flink's own serialization stack' is unable
>>>>> to serialize a type you define, then it will fall back on Kryo generics.
In
>>>>> this case, I believe what I'm being told is that state compatibility
is
>>>>> difficult to ensure, and schema evolution in your jobs is not possible.
>>>>>
>>>>> However on this slide, they say
>>>>> "
>>>>>        Kryo is generally not  recommended ...
>>>>>
>>>>>        Serialization frameworks with schema evolution support is
>>>>> recommended: Avro,         Thrift
>>>>> "
>>>>> So is this implying that Flink's non-default serialization stack does
>>>>> not support schema evolution? In this case is it best practice to register
>>>>> custom serializers whenever possible.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> *Grab is hiring. Learn more at **https://grab.careers
>>>>> <https://grab.careers/>*
>>>>>
>>>>> By communicating with Grab Inc and/or its subsidiaries, associate
>>>>> companies and jointly controlled entities (“Grab Group”), you are
deemed to
>>>>> have consented to processing of your personal data as set out in the
>>>>> Privacy Notice which can be viewed at https://grab.com/privacy/
>>>>>
>>>>> This email contains confidential information and is only for the
>>>>> intended recipient(s). If you are not the intended recipient(s), please
do
>>>>> not disseminate, distribute or copy this email and notify Grab Group
>>>>> immediately if you have received this by mistake and delete this email
from
>>>>> your system. Email transmission cannot be guaranteed to be secure or
>>>>> error-free as any information therein could be intercepted, corrupted,
>>>>> lost, destroyed, delayed or incomplete, or contain viruses. Grab Group
do
>>>>> not accept liability for any errors or omissions in the contents of this
>>>>> email arises as a result of email transmission. All intellectual property
>>>>> rights in this email and attachments therein shall remain vested in Grab
>>>>> Group, unless otherwise provided by law.
>>>>>
>>>>
>> *Grab is hiring. Learn more at **https://grab.careers
>> <https://grab.careers/>*
>>
>> By communicating with Grab Inc and/or its subsidiaries, associate
>> companies and jointly controlled entities (“Grab Group”), you are deemed to
>> have consented to processing of your personal data as set out in the
>> Privacy Notice which can be viewed at https://grab.com/privacy/
>>
>> This email contains confidential information and is only for the intended
>> recipient(s). If you are not the intended recipient(s), please do not
>> disseminate, distribute or copy this email and notify Grab Group
>> immediately if you have received this by mistake and delete this email from
>> your system. Email transmission cannot be guaranteed to be secure or
>> error-free as any information therein could be intercepted, corrupted,
>> lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do
>> not accept liability for any errors or omissions in the contents of this
>> email arises as a result of email transmission. All intellectual property
>> rights in this email and attachments therein shall remain vested in Grab
>> Group, unless otherwise provided by law.
>>
>

-- 
_Grab is hiring. Learn more at *https://grab.careers 
<https://grab.careers/>*_


By communicating with Grab Inc and/or its 
subsidiaries, associate companies and jointly controlled entities (“Grab 
Group”), you are deemed to have consented to processing of your personal 
data as set out in the Privacy Notice which can be viewed at 
https://grab.com/privacy/ <https://grab.com/privacy/>


This email contains 
confidential information and is only for the intended recipient(s). If you 
are not the intended recipient(s), please do not disseminate, distribute or 
copy this email and notify Grab Group immediately if you have received this 
by mistake and delete this email from your system. Email transmission 
cannot be guaranteed to be secure or error-free as any information therein 
could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or 
contain viruses. Grab Group do not accept liability for any errors or 
omissions in the contents of this email arises as a result of email 
transmission. All intellectual property rights in this email and 
attachments therein shall remain vested in Grab Group, unless otherwise 
provided by law.


Mime
View raw message