kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guozhang Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-6049) Kafka Streams: Add Cogroup in the DSL
Date Wed, 11 Oct 2017 00:58:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199642#comment-16199642

Guozhang Wang commented on KAFKA-6049:

WIP PR ready at https://github.com/apache/kafka/pull/2975#issuecomment-331275009.

Needs someone to pick it up, address the left comments, rebase on trunk and push a new PR
to continue.

> Kafka Streams: Add Cogroup in the DSL
> -------------------------------------
>                 Key: KAFKA-6049
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6049
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Kyle Winkelman
>              Labels: api, needs-kip, user-experience
> When multiple streams aggregate together to form a single larger object (eg. A shopping
website may have a cart stream, a wish list stream, and a purchases stream. Together they
make up a Customer.), it is very difficult to accommodate this in the Kafka-Streams DSL. It
generally requires you to group and aggregate all of the streams to KTables then make multiple
outerjoin calls to end up with a KTable with your desired object. This will create a state
store for each stream and a long chain of ValueJoiners that each new record must go through
to get to the final object.
> Creating a cogroup method where you use a single state store will:
>  Reduce the number of gets from state stores. With the multiple joins when a new value
comes into any of the streams a chain reaction happens where ValueGetters keep calling ValueGetters
until we have accessed all state stores.
> Slight performance increase. As described above all ValueGetters are called also causing
all ValueJoiners to be called forcing a recalculation of the current joined value of all other
streams, impacting performance.

This message was sent by Atlassian JIRA

View raw message