flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aljoscha Krettek (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-8716) AvroSerializer does not use schema of snapshot
Date Tue, 20 Feb 2018 13:31:03 GMT

    [ https://issues.apache.org/jira/browse/FLINK-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370051#comment-16370051
] 

Aljoscha Krettek commented on FLINK-8716:
-----------------------------------------

The idea is that when a {{TypeSerializer}} signals that it requires migration the backend
would read all data with the old serialiser and re-encode with the new serialiser. This way,
we would always have data consistently encoded in the backend.

The problem is that this isn't implemented yet so I think we cannot change how the serialiser
works currently.

> AvroSerializer does not use schema of snapshot
> ----------------------------------------------
>
>                 Key: FLINK-8716
>                 URL: https://issues.apache.org/jira/browse/FLINK-8716
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Arvid Heise
>            Priority: Major
>
> The new AvroSerializer stores the schema in the snapshot and uses it to validate compability.
> However, it does not use the schema of the snapshot while reading the data. This version
will fail for any change of the data layout (so it supports more or less only renaming currently).
>  [https://github.com/apache/flink/blob/f3a2197a23524048200ae2b4712d6ed833208124/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroSerializer.java#L265]
>  needs to use the schema from
>  [https://github.com/apache/flink/blob/f3a2197a23524048200ae2b4712d6ed833208124/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroSerializer.java#L188]
>  as the first parameter. Accordingly, a readSchema field need to be set
>  in #ensureCompatibility and relayed in #duplicate. Note that the readSchema is passed
as the write schema parameter to the DatumReader, as it was the schema that was used to write
the data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message