kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Development <...@yeralin.net>
Subject Re: [DISCUSS] KIP-466: Add support for List<T> serialization and deserialization
Date Fri, 21 Jun 2019 18:29:57 GMT
I made and pushed necessary commits, so we could review the final version under PR https://github.com/apache/kafka/pull/6592

I also need some advice on writing tests for this new serde. So far I only have two test cases
(roundtrip and empty payload), I’m not sure if it is enough.

Thank y’all for your help in this KIP :)

Best,
Daniyar Yeralin


> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io> wrote:
> 
> Hey Daniyar,
> 
> Looks good to me! Thanks for considering it.
> 
> Thanks,
> -John
> 
> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <mailto:dev@yeralin.net>>
wrote:
> Hey John and Matthias,
> 
> Yes, now I see it all. I’m storing lots of redundant information.
> Here is my final idea. Yes, now a user should pass a list type. I realized that’s the
type is not really needed in ListSerializer, but only in ListDeserializer:
> 
> 
> In ListSerializer we will start storing sizes only if serializer is not a primitive serializer:
> 
> 
> Then, in deserializer, we persist passed list type, so that during deserialization we
could create an instance of it with predefined listSize for better performance.
> We also try to locate a primitiveSize based on passed deserializer. If it is not there,
then primitiveSize will be null. Which means that each entry’s size was encoded individually.
> 
> 
> This looks much cleaner and more concise.
> 
> What do you think?
> 
> Best,
> Daniyar Yeralin 
> 
>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <mailto:matthias@confluent.io>>
wrote:
>> 
>> For encoding the list-type: I see John's point about re-encoding the
>> list-type redundantly. However, I also don't like the idea that the
>> Deserializer returns a fixed type...
>> 
>> Maybe it's best allow users to specify the target list type on
>> deserialization via config?
>> 
>> Similar for the primitive types: I don't think we need to encode the
>> type size, but users could specify the type on the deserializer (via a
>> config again)?
>> 
>> 
>> About generics: nesting could be arbitrarily deep. Hence, I doubt we can
>> support this and a cast will be necessary at some point in the user code.
>> 
>> 
>> 
>> -Matthias
>> 
>> 
>> 
>> On 6/20/19 1:21 PM, John Roesler wrote:
>>> Hey Daniyar,
>>> 
>>> Thanks for looking at it!
>>> 
>>> Something like your screenshot is more along the lines of what I was
>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>> be "vanilla java"?
>>> 
>>> Unfortunately the deserializer needs more information, though. For
>>> example, what if the inner type is a Map<String,String>? The serde could
>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>> 
>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>   /**list type**/ LinkedList.class,
>>>   /**inner serde**/ new MyRecordSerde()
>>> )
>>> 
>>> And in configuration, it's something like:
>>> default.key.serde: org...ListSerde
>>> default.key.list.serde.type: java.util.LinkedList
>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>> 
>>> 
>>> What do you think?
>>> Thanks,
>>> -John
>>> 
>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <mailto:dev@yeralin.net>
>>> <mailto:dev@yeralin.net <mailto:dev@yeralin.net>>> wrote:
>>> 
>>>    Hey John,
>>> 
>>>    I gave read about TypeReference. It could work for the list serde.
>>>    However, it is not directly
>>>    supported: https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>
>>>    The only way is to pass an actual class object into the constructor,
>>>    something like:
>>> 
>>>    It could be an option, but not a pretty one. What do you think of my
>>>    approach to use vanilla java and canonical class name? (As described
>>>    previously)
>>> 
>>>    Best,
>>>    Daniyar Yeralin
>>> 
>>>>    On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <mailto:dev@yeralin.net>
>>>>    <mailto:dev@yeralin.net <mailto:dev@yeralin.net>>> wrote:
>>>> 
>>>>    Hi John,
>>>> 
>>>>    Thank you for your input! Yes, my idea looks a little bit over
>>>>    engineered :)
>>>> 
>>>>    I also wanted to see a feedback from Mathias as well since he gave
>>>>    me an idea about storing fixed/variable size entries.
>>>> 
>>>>    Best,
>>>>    Daniyar Yeralin
>>>> 
>>>>>    On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <mailto:john@confluent.io>
>>>>>    <mailto:john@confluent.io <mailto:john@confluent.io>>>
wrote:
>>>>> 
>>>>>    Hi Daniyar,
>>>>> 
>>>>>    That's a very clever solution!
>>>>> 
>>>>>    One observation is that, now, this is what we might call a
>>>>>    polymorphic
>>>>>    serde. That is, you're detecting the actual concrete type and then
>>>>>    promising to produce the exact same concrete type on read. There are
>>>>>    some inherent problems with this approach, which in general require
>>>>>    some kind of  schema registry (not necessarily Schema Registry, just
>>>>>    any registry for schemas) to solve.
>>>>> 
>>>>>    Notice that every serialized record has quite a bit of duplicated
>>>>>    information: the concrete type as well as a byte to indicate whether
>>>>>    the value type is a fixed size, and, if so, an integer to
>>>>>    indicate the
>>>>>    actual size. These constitute a schema, of sorts, because they
>>>>>    tell us
>>>>>    later how exactly to deserialize the data. Unfortunately, this
>>>>>    information is completely redundant. In all likelihood, the
>>>>>    information will be exactly the same for every record in the topic.
>>>>>    This problem is essentially the core motivation for serializations
>>>>>    like Avro: to move the schema outside of the serialization itself,
so
>>>>>    that the records won't contain so much redundant information.
>>>>> 
>>>>>    In this light, I'm wondering if it makes sense to go back to
>>>>>    something
>>>>>    like what you had earlier in which you don't support perfectly
>>>>>    preserving the concrete type for _this_ serde, but instead just
>>>>>    support deserializing to _some_ List. Then, you could defer full,
>>>>>    perfect, type preservation to serdes that have an external system
in
>>>>>    which to register their type information.
>>>>> 
>>>>>    There does exist an alternative, if we really do want to preserve
the
>>>>>    concrete type (which does seem kind of nice). You can add a
>>>>>    configuration option specifically for the serde to configure what
the
>>>>>    list type will be, and maybe what the element type is, as well.
>>>>> 
>>>>>    As far as "related work" goes, you might be interested to take a look
>>>>>    at how Jackson can be configured to deserialize into a specific,
>>>>>    arbitrarily nested, generically parameterized class structure.
>>>>>    Specifically, you might find
>>>>>    https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
<https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
>>>>>    interesting.
>>>>> 
>>>>>    Thanks,
>>>>>    -John
>>>>> 
>>>>>    On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <mailto:dev@yeralin.net>
>>>>>    <mailto:dev@yeralin.net <mailto:dev@yeralin.net>>>
wrote:
>>>>>> 
>>>>>>    bump
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message