Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 16A74200AC0 for ; Wed, 1 Jun 2016 15:26:01 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 15169160A16; Wed, 1 Jun 2016 13:26:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5CFE9160A4C for ; Wed, 1 Jun 2016 15:26:00 +0200 (CEST) Received: (qmail 11153 invoked by uid 500); 1 Jun 2016 13:25:59 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 11129 invoked by uid 99); 1 Jun 2016 13:25:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jun 2016 13:25:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 401E12C1F62 for ; Wed, 1 Jun 2016 13:25:59 +0000 (UTC) Date: Wed, 1 Jun 2016 13:25:59 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: dev@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (KAFKA-3561) Auto create through topic for KStream aggregation and join MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 01 Jun 2016 13:26:01 -0000 [ https://issues.apache.org/jira/browse/KAFKA-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310298#comment-15310298 ] ASF GitHub Bot commented on KAFKA-3561: --------------------------------------- GitHub user dguy opened a pull request: https://github.com/apache/kafka/pull/1453 KAFKA-3561: Auto create through topic for KStream aggregation and join [WIP] @guozhangwang can you please take a look at this? It is not close to being done but i'd like some feedback to ensure i'm heading down the right path and understand what this JIRA is asking. The main things to look at are the `KStreamImpl.map(...)` method I added and the change made to `TopologyBuilder.copartitionGroups` There is also a test, `KStreamRepartitionMappedKeyTest`, that passes with these changes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dguy/kafka KAFKA-3561 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1453.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1453 ---- commit 053c809b5aa322327988909d53027d4682df6825 Author: Damian Guy Date: 2016-06-01T13:03:23Z repartition through internal topic on KStream.map() ---- > Auto create through topic for KStream aggregation and join > ---------------------------------------------------------- > > Key: KAFKA-3561 > URL: https://issues.apache.org/jira/browse/KAFKA-3561 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: Guozhang Wang > Assignee: Damian Guy > Labels: api > Fix For: 0.10.1.0 > > > For KStream.join / aggregateByKey operations that requires the streams to be partitioned on the record key, today users should repartition themselves through the "through" call: > {code} > stream1 = builder.stream("topic1"); > stream2 = builder.stream("topic2"); > stream3 = stream1.map(/* set the right key for join*/).through("topic3"); > stream4 = stream2.map(/* set the right key for join*/).through("topic4"); > stream3.join(stream4, ..) > {code} > This pattern can actually be done by the Streams DSL itself instead of requiring users to specify themselves, i.e. users can just set the right key like (see KAFKA-3430) and then call join, which will be translated by adding the "internal topic for repartition". > Another thing is that today if user do not call "through" after setting a new key, the aggregation result would not be correct as the aggregation is based on key B while the source partitions is partitioned by key A and hence each task will only get a partial aggregation for all keys. But this is not validated in the DSL today. We should do both the auto-translation and validation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)