kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-7293) Merge followed by groupByKey/join might violate co-partioning
Date Tue, 14 Aug 2018 20:32:00 GMT
Matthias J. Sax created KAFKA-7293:
--------------------------------------

             Summary: Merge followed by groupByKey/join might violate co-partioning
                 Key: KAFKA-7293
                 URL: https://issues.apache.org/jira/browse/KAFKA-7293
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: Matthias J. Sax


The merge() operations can be applied to input KStreams that have a different number of tasks
(ie, input topic partitions). For this case, the input topics are not co-partitioned and thus
the result KStream is not partitioned even if each input KStream is partitioned by its own.

Because, no "repartitionRequired" flag is set on the input KStreams, the flag is also not
set on the output KStream. Hence, if a groupByKey() or join() operation is applied the output
KStream, we don't insert a repartition topic. However, repartitioning would be required because the
KStream is not partitioned.

We cannot detect this during compile time, because the number or partitions is unknown, and
thus, we cannot decide if repartitioning is required or not. However, we can add a runtime
check similar to joins() that checks if data is correctly (co-)partitioned and if not, we
can raise a runtime exception.

Note, for merge() in contrast to join(), we should only check for co-partitioning, if the
merge() is followed by a groupByKey() or join() operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message