kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Bellemare <adam.bellem...@gmail.com>
Subject Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted.
Date Mon, 30 Jul 2018 13:38:43 GMT
Hi Guozhang et al

I was just reading the 2.0 release notes and noticed a section on Record
Headers.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-244%3A+Add+Record+Header+support+to+Kafka+Streams+Processor+API

I am not yet sure if the contents of a RecordHeader is propagated all the
way through the Sinks and Sources, but if it is, and if it remains attached
to the record (including null records) I may be able to ditch the
propagationWrapper for an implementation using RecordHeader. I am not yet
sure if this is doable, so if anyone understands RecordHeader impl better
than I, I would be happy to hear from you.

In the meantime, let me know of any questions. I believe this PR has a lot
of potential to solve problems for other people, as I have encountered a
number of other companies in the wild all home-brewing their own solutions
to come up with a method of handling relational data in streams.

Adam


On Fri, Jul 27, 2018 at 1:45 AM, Guozhang Wang <wangguoz@gmail.com> wrote:

> Hello Adam,
>
> Thanks for rebooting the discussion of this KIP ! Let me finish my pass on
> the wiki and get back to you soon. Sorry for the delays..
>
> Guozhang
>
> On Tue, Jul 24, 2018 at 6:08 AM, Adam Bellemare <adam.bellemare@gmail.com>
> wrote:
>
>> Let me kick this off with a few starting points that I would like to
>> generate some discussion on.
>>
>> 1) It seems to me that I will need to repartition the data twice - once on
>> the foreign key, and once back to the primary key. Is there anything I am
>> missing here?
>>
>> 2) I believe I will also need to materialize 3 state stores: the
>> prefixScan
>> SS, the highwater mark SS (for out-of-order resolution) and the final
>> state
>> store, due to the workflow I have laid out. I have not thought of a better
>> way yet, but would appreciate any input on this matter. I have gone back
>> through the mailing list for the previous discussions on this KIP, and I
>> did not see anything relating to resolving out-of-order compute. I cannot
>> see a way around the current three-SS structure that I have.
>>
>> 3) Caching is disabled on the prefixScan SS, as I do not know how to
>> resolve the iterator obtained from rocksDB with that of the cache. In
>> addition, I must ensure everything is flushed before scanning. Since the
>> materialized prefixScan SS is under "control" of the function, I do not
>> anticipate this to be a problem. Performance throughput will need to be
>> tested, but as Jan observed in his initial overview of this issue, it is
>> generally a surge of output events which affect performance moreso than
>> the
>> flush or prefixScan itself.
>>
>> Thoughts on any of these are greatly appreciated, since these elements are
>> really the cornerstone of the whole design. I can put up the code I have
>> written against 1.0.2 if we so desire, but first I was hoping to just
>> tackle some of the fundamental design proposals.
>>
>> Thanks,
>> Adam
>>
>>
>>
>> On Mon, Jul 23, 2018 at 10:05 AM, Adam Bellemare <
>> adam.bellemare@gmail.com>
>> wrote:
>>
>> > Here is the new discussion thread for KIP-213. I picked back up on the
>> KIP
>> > as this is something that we too at Flipp are now running in production.
>> > Jan started this last year, and I know that Trivago is also using
>> something
>> > similar in production, at least in terms of APIs and functionality.
>> >
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 213+Support+non-key+joining+in+KTable
>> >
>> > I do have an implementation of the code for Kafka 1.0.2 (our local
>> > production version) but I won't post it yet as I would like to focus on
>> the
>> > workflow and design first. That being said, I also need to add some
>> clearer
>> > integration tests (I did a lot of testing using a non-Kafka Streams
>> > framework) and clean up the code a bit more before putting it in a PR
>> > against trunk (I can do so later this week likely).
>> >
>> > Please take a look,
>> >
>> > Thanks
>> >
>> > Adam Bellemare
>> >
>> >
>>
>
>
>
> --
> -- Guozhang
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message