kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Roesler <j...@confluent.io>
Subject Re: [DISCUSS] KIP-466: Add support for List<T> serialization and deserialization
Date Tue, 23 Jul 2019 20:30:25 GMT
Hmm, that's a tricky situation.

I think Daniyar was on the right track... Producer only cares about
serializer configs, and Consumer only cares about deserializer configs.

I didn't see the problem with your proposal:

ProducerConfig:
> list.key/value.serializer.type
> list.key/value.serializer.inner
> ConsumerConfig:
> list.key/value.deserializer.type
> list.key/value.deserializer.inner
> StreamsConfig:
> default.list.key/value.serde.type
> default.list.key/value.serde.inner


It seems like the key/value serde configs are a better analogy than the
windowed serde.
ProducerConfig: key.serializer
ConsumerConfig: key.deserializer
StreamsConfig: default.key.serde

Just to make sure I understand the problem you're highlighting:
I guess the difference is that the serializer and deserializer that are
nested inside the serde also need to be configured? So, by default I'd have
to specify all six configs when I'm using Streams?

I guess in the Serde, it could make use of a package-protected constructor
for the serializer and deserializer that fixes the list type and inner type
to the serde-configured ones. Then, when you're configuring Streams, you
only need to specify the StreamsConfigs.

Does that work?
-John


On Tue, Jul 23, 2019 at 11:39 AM Development <dev@yeralin.net> wrote:

> Bump
>
> > On Jul 22, 2019, at 11:26 AM, Development <dev@yeralin.net> wrote:
> >
> > Hey Matthias,
> >
> > It looks a little confusing, but I don’t have enough expertise to judge
> on the configuration placement.
> >
> > If you think, it is fine I’ll go ahead with this approach.
> >
> > Best,
> > Daniyar Yeralin
> >
> >> On Jul 19, 2019, at 5:49 PM, Matthias J. Sax <matthias@confluent.io>
> wrote:
> >>
> >> Good point.
> >>
> >> I guess the simplest solution is, to actually add
> >>
> >>>> default.list.key/value.serde.type
> >>>> default.list.key/value.serde.inner
> >>
> >> to both `CommonClientConfigs` and `StreamsConfig`.
> >>
> >> It's not super clean, but I think it's the best we can do. Thoughts?
> >>
> >>
> >> -Matthias
> >>
> >> On 7/19/19 1:23 PM, Development wrote:
> >>> Hi Matthias,
> >>>
> >>> I agree, ConsumerConfig did not seem like a right place for these
> configurations.
> >>> I’ll put them in ProducerConfig, ConsumerConfig, and StreamsConfig.
> >>>
> >>> However, I have a question. What should I do in "configure(Map<String,
> ?> configs, boolean isKey)” methods? Which configurations should I try to
> locate? I was comparing my (de)serializer implementations with
> SessionWindows(De)serializer classes, and they use StreamsConfig class to
> get  either StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS :
> StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS
> >>>
> >>> In my case, as I mentioned earlier, StreamsConfig class is not
> accessible from org.apache.kafka.common.serialization package. So, I can’t
> utilize it. Any suggestions here?
> >>>
> >>> Best,
> >>> Daniyar Yeralin
> >>>
> >>>
> >>>> On Jul 18, 2019, at 8:46 PM, Matthias J. Sax <matthias@confluent.io>
> wrote:
> >>>>
> >>>> Thanks!
> >>>>
> >>>> One minor question about the configs. The KIP adds three classes, a
> >>>> Serializer, a Deserializer, and a Serde.
> >>>>
> >>>> Hence, would it make sense to add the corresponding configs to
> >>>> `ConsumerConfig`, `ProducerConfig`, and `StreamsConfig` using slightly
> >>>> different names each time?
> >>>>
> >>>>
> >>>> Somethin like this:
> >>>>
> >>>> ProducerConfig:
> >>>>
> >>>> list.key/value.serializer.type
> >>>> list.key/value.serializer.inner
> >>>>
> >>>> ConsumerConfig:
> >>>>
> >>>> list.key/value.deserializer.type
> >>>> list.key/value.deserializer.inner
> >>>>
> >>>> StreamsConfig:
> >>>>
> >>>> default.list.key/value.serde.type
> >>>> default.list.key/value.serde.inner
> >>>>
> >>>>
> >>>> Adding `d.l.k/v.serde.t/i` to `CommonClientConfigs does not sound
> right
> >>>> to me. Also note, that it seems better to avoid the `default.` prefix
> >>>> for consumers and producers because there is only one Serializer or
> >>>> Deserializer anyway. Only for Streams, there are multiple and
> >>>> StreamsConfig specifies the default one of an operator does not
> >>>> overwrite it.
> >>>>
> >>>> Thoughts?
> >>>>
> >>>>
> >>>> Also, the KIP should explicitly mention to what classed certain
> configs
> >>>> are added. Atm, the KIP only list parameter names, but does not state
> >>>> where those are added.
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 7/16/19 1:11 PM, Development wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Yes, totally forgot about the statement. KIP-466 is updated.
> >>>>>
> >>>>> Thank you so much John Roesler, Matthias J. Sax, Sophie Blee-Goldman
> for your valuable input!
> >>>>>
> >>>>> I hope I did not cause too much trouble :)
> >>>>>
> >>>>> I’ll start the vote now.
> >>>>>
> >>>>> Best,
> >>>>> Daniyar Yeralin
> >>>>>
> >>>>>> On Jul 16, 2019, at 3:17 PM, John Roesler <john@confluent.io>
> wrote:
> >>>>>>
> >>>>>> Hi Daniyar,
> >>>>>>
> >>>>>> Thanks for that update. I took a look, and I think this is in good
> shape.
> >>>>>>
> >>>>>> One note, the statement "New method public static <T> Serde<List<T>>
> >>>>>> ListSerde() in org.apache.kafka.common.serialization.Serdes class
> >>>>>> (infers list implementation and inner serde from config file)" is
> >>>>>> still present in the KIP, although I do it is was removed from the
> PR.
> >>>>>>
> >>>>>> Once you remove that statement from the KIP, then I think this KIP
> is
> >>>>>> ready to go up for a vote! Then, we can really review the PR in
> >>>>>> earnest and get this thing merged.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -john
> >>>>>>
> >>>>>> On Tue, Jul 16, 2019 at 2:05 PM Development <dev@yeralin.net>
> wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Pushed new changes under my PR:
> https://github.com/apache/kafka/pull/6592 <
> https://github.com/apache/kafka/pull/6592>
> >>>>>>>
> >>>>>>> Feel free to put any comments in there.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Daniyar Yeralin
> >>>>>>>
> >>>>>>>> On Jul 15, 2019, at 1:06 PM, Development <dev@yeralin.net> wrote:
> >>>>>>>>
> >>>>>>>> Hi John,
> >>>>>>>>
> >>>>>>>> I knew I was missing something. Yes, that makes sense now, I
> removed all `listSerde()` methods, and left empty constructors instead.
> >>>>>>>>
> >>>>>>>> As per `CommonClientConfigs` I looked at the class, it doesn’t
> have any properties related to serdes, and that bothers me a little.
> >>>>>>>>
> >>>>>>>> All properties like `default.key.serde`
> `default.windowed.key.serde.*` are located in StreamsConfig. I don’t want
> to create a confusion.
> >>>>>>>> What also doesn’t make sense to me is that `WindowedSerdes` and
> its (de)serializers are not located in
> org.apache.kafka.common.serialization. I guess it kind of makes sense since
> windowed serdes are only available for kafka streams, not vice versa.
> >>>>>>>>
> >>>>>>>> If everyone is okay to put list properties in
> `CommonClientConfigs` class, I’ll go ahead and do that then.
> >>>>>>>>
> >>>>>>>> Thank you for your input!
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Daniyar Yeralin
> >>>>>>>>
> >>>>>>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <john@confluent.io>
> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> Regarding the placement, you might as well move the constants to
> `org.apache.kafka.clients.CommonClientConfigs`, so that the constants and
> the configs and the code are in the same module.
> >>>>>>>>>
> >>>>>>>>> Regarding the constructor... What Matthias said is correct: The
> serde, serializer, and deserializer all need to have zero-arg constructors
> so they can be instantiated reflectively by Kafka. However, the factory
> method you proposed "New method public static <T> Serde<List<T>>
> ListSerde()" is not a constructor, and is not required. It would be used
> purely from the Java interface, but has the drawbacks I listed above. This
> method, not the constructor, is what I proposed to remove from the KIP.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> -John
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net
> <mailto:dev@yeralin.net>> wrote:
> >>>>>>>>> One problem though.
> >>>>>>>>>
> >>>>>>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m
> trying to mimic the implementation of my ListSerde accordingly.
> >>>>>>>>>
> >>>>>>>>> I created couple constants under StreamsConfig:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> And trying to do similar construct:
> >>>>>>>>> final String propertyName = isKey ?
> StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS :
> StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
> >>>>>>>>> But then found out that StreamsConfig is not accessible from
> org.apache.kafka.common.serialization package while window serde
> (de)serializers are located under org.apache.kafka.streams.kstream package.
> >>>>>>>>>
> >>>>>>>>> What should I do? Should I move my classes under
> org.apache.kafka.streams.kstream package instead?
> >>>>>>>>>
> >>>>>>>>>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net
> <mailto:dev@yeralin.net>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Matthias,
> >>>>>>>>>>
> >>>>>>>>>> Thank you for your input.
> >>>>>>>>>>
> >>>>>>>>>> I updated the KIP, made it a little more readable.
> >>>>>>>>>>
> >>>>>>>>>> I think the configuration parameters strategy is finalized then.
> >>>>>>>>>>
> >>>>>>>>>> Do you have any other questions/concerns regarding this KIP?
> >>>>>>>>>>
> >>>>>>>>>> Meanwhile I’ll start doing appropriate code changes, and commit
> them under my PR.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>
> >>>>>>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <
> matthias@confluent.io <mailto:matthias@confluent.io>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Daniyar,
> >>>>>>>>>>>
> >>>>>>>>>>> thanks for the update to the KIP. It's in really good shape
> and well
> >>>>>>>>>>> written.
> >>>>>>>>>>>
> >>>>>>>>>>> About the default constructor question:
> >>>>>>>>>>>
> >>>>>>>>>>> All Serdes/Serializer/Deserializer classes need a default
> constructor to
> >>>>>>>>>>> create them easily via reflections when specifies in a config.
> I
> >>>>>>>>>>> understand that it is not super user friendly, but all
> existing code
> >>>>>>>>>>> works this way. Hence, it seems best to stick with the
> established pattern.
> >>>>>>>>>>>
> >>>>>>>>>>> We have a similar issue with `TimeWindowedSerde` and
> >>>>>>>>>>> `SessionWindowedSerde`, and I just recently did a PR to
> improve user
> >>>>>>>>>>> experience that address the exact issue John raised. (cf
> >>>>>>>>>>> https://github.com/apache/kafka/pull/7067 <
> https://github.com/apache/kafka/pull/7067>)
> >>>>>>>>>>>
> >>>>>>>>>>> Note, that if a user would instantiate the Serde manually, the
> user
> >>>>>>>>>>> would also need to call `configure()` to setup the inner
> serdes. Kafka
> >>>>>>>>>>> Streams would not setup those automatically and one might most
> likely
> >>>>>>>>>>> end-up with an NPE.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Coming back the KIP, and the parameter names. `WindowedSerdes`
> are
> >>>>>>>>>>> similar to `ListSerde` as they wrap another Serde. For
> `WindowedSerdes`,
> >>>>>>>>>>> we use the following parameter names:
> >>>>>>>>>>>
> >>>>>>>>>>> - default.windowed.key.serde.inner
> >>>>>>>>>>> - default.windowed.value.serde.inner
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> It might be good to align the naming pattern. I would also
> suggest to
> >>>>>>>>>>> use `type` instead of `impl`?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> default.key.list.serde.impl  ->  default.list.key.serde.type
> >>>>>>>>>>> default.value.list.serde.impl  ->
> default.list.value.serde.type
> >>>>>>>>>>> default.key.list.serde.element  ->
> default.list.key.serde.inner
> >>>>>>>>>>> default.value.list.serde.element  ->
> default.list.value.serde.inner
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> -Matthias
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 7/10/19 8:52 AM, Development wrote:
> >>>>>>>>>>>> Hi John,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Yes, I do agree. That totally makes sense. The only thing is
> that it goes against what Matthias suggested earlier:
> >>>>>>>>>>>> "I think that ... `ListSerde` should have an default
> constructor and it should be possible to pass in the `Class listClass`
> information via a configuration. Otherwise, KafkaStreams cannot use it as
> default serde.”
> >>>>>>>>>>>>
> >>>>>>>>>>>> What do you think about that? I hope I’m not confusing
> anything.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <john@confluent.io
> <mailto:john@confluent.io>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks for
> the update, too.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Just one more super-small question, do we need this variant:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in
> org.apache.kafka.common.serialization.Serdes class (infers list
> implementation and inner serde from config file)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It seems like this situation implies my config file is
> already set up for the list serde, so passing this serde (e.g., in
> Produced) would have the same effect as not specifying it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I guess that it could be the case that you have the
> `default.key/value.serde` set to something else, like StringSerde, but you
> still have the `default.key/value.list.serde.impl/element` set. This seems
> like it would result in more confusion than convenience, so my gut instinct
> is maybe we shouldn't introduce the `ListSerde()` variant until people
> actually request it later on.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thus, we'd just stick with fully config-driven or fully
> source-code-driven, not half/half.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> What do you think?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> -John
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net
> <mailto:dev@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net>>>
> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi John,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I hope everyone had a great long weekend.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Regarding Java interfaces, I may not understand you
> correctly, but I think I already listed them:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So for Produced, you would use it in the following fashion,
> for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class,
> Serdes.Integer()))
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I also updated the KIP, and added a section “Serialization
> Strategy” where I describe our logic of conditional serialization based on
> the type of an inner serde.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thank you!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <
> john@confluent.io <mailto:john@confluent.io> <mailto:john@confluent.io
> <mailto:john@confluent.io>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks for the update, Daniyar!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> In addition to specifying the config interface, can you
> also specify
> >>>>>>>>>>>>>> the Java interface? Namely, if I need to pass an instance
> of this
> >>>>>>>>>>>>>> serde in to the DSL directly, as in Produced, Materialized,
> etc., what
> >>>>>>>>>>>>>> constructor(s) would I have available? Likewise with the
> Serializer
> >>>>>>>>>>>>>> and Deserailizer. I don't think you need to specify the
> implementation
> >>>>>>>>>>>>>> logic, since we've already discussed it here.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> If you also want to specify the serialized format of the
> data records
> >>>>>>>>>>>>>> in the KIP, it could be useful documentation, as well as
> letting us
> >>>>>>>>>>>>>> verify the schema for forward/backward compatibility
> concerns, etc.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> John
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <
> dev@yeralin.net <mailto:dev@yeralin.net> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hey,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Finally made updates to the KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>>
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >>>
> >>>>>>>>>>>>>> Sorry for the delay :)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thank You!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <
> matthias@confluent.io <mailto:matthias@confluent.io> <mailto:
> matthias@confluent.io <mailto:matthias@confluent.io>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, something like this. I did not think about good
> configuration
> >>>>>>>>>>>>>> parameter names yet. I am also not sure if I understand all
> proposed
> >>>>>>>>>>>>>> configs atm. But all configs should be listed and explained
> in the KIP
> >>>>>>>>>>>>>> anyway, and we can discuss further after you have updated
> the KIP (I can
> >>>>>>>>>>>>>> ask more detailed question if I have any).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 6/21/19 2:05 PM, Development wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to
> have in a list
> >>>>>>>>>>>>>> of primitives.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> As for the default constructor and configurability, just
> want to make
> >>>>>>>>>>>>>> sure. Is this what you have on your mind?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <
> matthias@confluent.io <mailto:matthias@confluent.io> <mailto:
> matthias@confluent.io <mailto:matthias@confluent.io>>
> >>>>>>>>>>>>>> <mailto:matthias@confluent.io <mailto:matthias@confluent.io>
> <mailto:matthias@confluent.io <mailto:matthias@confluent.io>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks for the update!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and
> `ListSerde`
> >>>>>>>>>>>>>> should have an default constructor and it should be
> possible to pass in
> >>>>>>>>>>>>>> the `Class listClass` information via a configuration.
> Otherwise,
> >>>>>>>>>>>>>> KafkaStreams cannot use it as default serde.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For the primitive serializers: `BytesSerializer` is not
> primitive IMHO,
> >>>>>>>>>>>>>> as is it for `byte[]` with variable length -- it's for
> arrays, not for
> >>>>>>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping
> `byte[]`).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For tests, we can comment on the PR. No need to do this in
> the KIP
> >>>>>>>>>>>>>> discussion.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Can you also update the KIP?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 6/21/19 11:29 AM, Development wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I made and pushed necessary commits, so we could review the
> final
> >>>>>>>>>>>>>> version under PR https://github.com/apache/kafka/pull/6592
> <https://github.com/apache/kafka/pull/6592> <
> https://github.com/apache/kafka/pull/6592 <
> https://github.com/apache/kafka/pull/6592>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I also need some advice on writing tests for this new
> serde. So far I
> >>>>>>>>>>>>>> only have two test cases (roundtrip and empty payload), I’m
> not sure
> >>>>>>>>>>>>>> if it is enough.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thank y’all for your help in this KIP :)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <
> john@confluent.io <mailto:john@confluent.io> <mailto:john@confluent.io
> <mailto:john@confluent.io>>
> >>>>>>>>>>>>>> <mailto:john@confluent.io <mailto:john@confluent.io>
> <mailto:john@confluent.io <mailto:john@confluent.io>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hey Daniyar,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Looks good to me! Thanks for considering it.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <
> dev@yeralin.net <mailto:dev@yeralin.net> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net>>
> >>>>>>>>>>>>>> <mailto:dev@yeralin.net <mailto:dev@yeralin.net> <mailto:
> dev@yeralin.net <mailto:dev@yeralin.net>>> <mailto:dev@yeralin.net
> <mailto:dev@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net>>>>
> wrote:
> >>>>>>>>>>>>>> Hey John and Matthias,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant
> information.
> >>>>>>>>>>>>>> Here is my final idea. Yes, now a user should pass a list
> type. I
> >>>>>>>>>>>>>> realized that’s the type is not really needed in
> ListSerializer, but
> >>>>>>>>>>>>>> only in ListDeserializer:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> In ListSerializer we will start storing sizes only if
> serializer is
> >>>>>>>>>>>>>> not a primitive serializer:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Then, in deserializer, we persist passed list type, so that
> during
> >>>>>>>>>>>>>> deserialization we could create an instance of it with
> predefined
> >>>>>>>>>>>>>> listSize for better performance.
> >>>>>>>>>>>>>> We also try to locate a primitiveSize based on passed
> deserializer.
> >>>>>>>>>>>>>> If it is not there, then primitiveSize will be null. Which
> means
> >>>>>>>>>>>>>> that each entry’s size was encoded individually.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This looks much cleaner and more concise.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What do you think?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <
> matthias@confluent.io <mailto:matthias@confluent.io> <mailto:
> matthias@confluent.io <mailto:matthias@confluent.io>>
> >>>>>>>>>>>>>> <mailto:matthias@confluent.io <mailto:matthias@confluent.io>
> <mailto:matthias@confluent.io <mailto:matthias@confluent.io>>> <mailto:
> matthias@confluent.io <mailto:matthias@confluent.io> <mailto:
> matthias@confluent.io <mailto:matthias@confluent.io>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For encoding the list-type: I see John's point about
> re-encoding the
> >>>>>>>>>>>>>> list-type redundantly. However, I also don't like the idea
> that the
> >>>>>>>>>>>>>> Deserializer returns a fixed type...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Maybe it's best allow users to specify the target list type
> on
> >>>>>>>>>>>>>> deserialization via config?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Similar for the primitive types: I don't think we need to
> encode the
> >>>>>>>>>>>>>> type size, but users could specify the type on the
> deserializer (via a
> >>>>>>>>>>>>>> config again)?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence, I
> doubt
> >>>>>>>>>>>>>> we can
> >>>>>>>>>>>>>> support this and a cast will be necessary at some point in
> the user
> >>>>>>>>>>>>>> code.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hey Daniyar,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks for looking at it!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Something like your screenshot is more along the lines of
> what I was
> >>>>>>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how
> would that not
> >>>>>>>>>>>>>> be "vanilla java"?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Unfortunately the deserializer needs more information,
> though. For
> >>>>>>>>>>>>>> example, what if the inner type is a Map<String,String>?
> The serde
> >>>>>>>>>>>>>> could
> >>>>>>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still
> need an
> >>>>>>>>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> =
> Serdes.listSerde(
> >>>>>>>>>>>>>> /**list type**/ LinkedList.class,
> >>>>>>>>>>>>>> /**inner serde**/ new MyRecordSerde()
> >>>>>>>>>>>>>> )
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> And in configuration, it's something like:
> >>>>>>>>>>>>>> default.key.serde: org...ListSerde
> >>>>>>>>>>>>>> default.key.list.serde.type: java.util.LinkedList
> >>>>>>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What do you think?
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <
> dev@yeralin.net <mailto:dev@yeralin.net> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net>>
> >>>>>>>>>>>>>> <mailto:dev@yeralin.net <mailto:dev@yeralin.net> <mailto:
> dev@yeralin.net <mailto:dev@yeralin.net>>> <mailto:dev@yeralin.net
> <mailto:dev@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net
> >>>
> >>>>>>>>>>>>>> <mailto:dev@yeralin.net <mailto:dev@yeralin.net> <mailto:
> dev@yeralin.net <mailto:dev@yeralin.net>> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net>>>>>
> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hey John,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I gave read about TypeReference. It could work for the list
> serde.
> >>>>>>>>>>>>>> However, it is not directly
> >>>>>>>>>>>>>> supported:
> >>>>>>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <
> https://github.com/FasterXML/jackson-databind/issues/1490> <
> https://github.com/FasterXML/jackson-databind/issues/1490 <
> https://github.com/FasterXML/jackson-databind/issues/1490>>
> >>>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490
> <https://github.com/FasterXML/jackson-databind/issues/1490> <
> https://github.com/FasterXML/jackson-databind/issues/1490 <
> https://github.com/FasterXML/jackson-databind/issues/1490>>>
> >>>>>>>>>>>>>> The only way is to pass an actual class object into the
> constructor,
> >>>>>>>>>>>>>> something like:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> It could be an option, but not a pretty one. What do you
> think of my
> >>>>>>>>>>>>>> approach to use vanilla java and canonical class name? (As
> described
> >>>>>>>>>>>>>> previously)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net
> <mailto:dev@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net>>
> >>>>>>>>>>>>>> <mailto:dev@yeralin.net <mailto:dev@yeralin.net> <mailto:
> dev@yeralin.net <mailto:dev@yeralin.net>>> <mailto:dev@yeralin.net
> <mailto:dev@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net
> >>>
> >>>>>>>>>>>>>> <mailto:dev@yeralin.net <mailto:dev@yeralin.net> <mailto:
> dev@yeralin.net <mailto:dev@yeralin.net>> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net>>>>>
> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi John,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit
> over
> >>>>>>>>>>>>>> engineered :)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I also wanted to see a feedback from Mathias as well since
> he gave
> >>>>>>>>>>>>>> me an idea about storing fixed/variable size entries.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <
> john@confluent.io <mailto:john@confluent.io> <mailto:john@confluent.io
> <mailto:john@confluent.io>>
> >>>>>>>>>>>>>> <mailto:john@confluent.io <mailto:john@confluent.io>
> <mailto:john@confluent.io <mailto:john@confluent.io>>> <mailto:
> john@confluent.io <mailto:john@confluent.io> <mailto:john@confluent.io
> <mailto:john@confluent.io>>>
> >>>>>>>>>>>>>> <mailto:john@confluent.io <mailto:john@confluent.io>
> <mailto:john@confluent.io <mailto:john@confluent.io>> <mailto:
> john@confluent.io <mailto:john@confluent.io> <mailto:john@confluent.io
> <mailto:john@confluent.io>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Daniyar,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> That's a very clever solution!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> One observation is that, now, this is what we might call a
> >>>>>>>>>>>>>> polymorphic
> >>>>>>>>>>>>>> serde. That is, you're detecting the actual concrete type
> and then
> >>>>>>>>>>>>>> promising to produce the exact same concrete type on read.
> >>>>>>>>>>>>>> There are
> >>>>>>>>>>>>>> some inherent problems with this approach, which in general
> >>>>>>>>>>>>>> require
> >>>>>>>>>>>>>> some kind of  schema registry (not necessarily Schema
> >>>>>>>>>>>>>> Registry, just
> >>>>>>>>>>>>>> any registry for schemas) to solve.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Notice that every serialized record has quite a bit of
> duplicated
> >>>>>>>>>>>>>> information: the concrete type as well as a byte to indicate
> >>>>>>>>>>>>>> whether
> >>>>>>>>>>>>>> the value type is a fixed size, and, if so, an integer to
> >>>>>>>>>>>>>> indicate the
> >>>>>>>>>>>>>> actual size. These constitute a schema, of sorts, because
> they
> >>>>>>>>>>>>>> tell us
> >>>>>>>>>>>>>> later how exactly to deserialize the data. Unfortunately,
> this
> >>>>>>>>>>>>>> information is completely redundant. In all likelihood, the
> >>>>>>>>>>>>>> information will be exactly the same for every record in the
> >>>>>>>>>>>>>> topic.
> >>>>>>>>>>>>>> This problem is essentially the core motivation for
> serializations
> >>>>>>>>>>>>>> like Avro: to move the schema outside of the serialization
> >>>>>>>>>>>>>> itself, so
> >>>>>>>>>>>>>> that the records won't contain so much redundant
> information.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> In this light, I'm wondering if it makes sense to go back to
> >>>>>>>>>>>>>> something
> >>>>>>>>>>>>>> like what you had earlier in which you don't support
> perfectly
> >>>>>>>>>>>>>> preserving the concrete type for _this_ serde, but instead
> just
> >>>>>>>>>>>>>> support deserializing to _some_ List. Then, you could defer
> full,
> >>>>>>>>>>>>>> perfect, type preservation to serdes that have an external
> >>>>>>>>>>>>>> system in
> >>>>>>>>>>>>>> which to register their type information.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> There does exist an alternative, if we really do want to
> >>>>>>>>>>>>>> preserve the
> >>>>>>>>>>>>>> concrete type (which does seem kind of nice). You can add a
> >>>>>>>>>>>>>> configuration option specifically for the serde to configure
> >>>>>>>>>>>>>> what the
> >>>>>>>>>>>>>> list type will be, and maybe what the element type is, as
> well.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> As far as "related work" goes, you might be interested to
> take
> >>>>>>>>>>>>>> a look
> >>>>>>>>>>>>>> at how Jackson can be configured to deserialize into a
> specific,
> >>>>>>>>>>>>>> arbitrarily nested, generically parameterized class
> structure.
> >>>>>>>>>>>>>> Specifically, you might find
> >>>>>>>>>>>>>>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >>
> >>>>>>>>>>>>>> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >>>
> >>>>>>>>>>>>>> interesting.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <
> dev@yeralin.net <mailto:dev@yeralin.net> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net>>
> >>>>>>>>>>>>>> <mailto:dev@yeralin.net <mailto:dev@yeralin.net> <mailto:
> dev@yeralin.net <mailto:dev@yeralin.net>>> <mailto:dev@yeralin.net
> <mailto:dev@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net
> >>>
> >>>>>>>>>>>>>> <mailto:dev@yeralin.net <mailto:dev@yeralin.net> <mailto:
> dev@yeralin.net <mailto:dev@yeralin.net>> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net>>>>>
> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> bump
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message