kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <matth...@confluent.io>
Subject Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
Date Thu, 16 Feb 2017 00:04:54 GMT

according to the feedback, I updated the KIP, and limited its scope to
some extend:

- Instead of changing the creation of KafkaStreams instances, we keep
the current pattern (we might do a follow up KIP on this though).

- We also added a new method #describe() that returns a
TopologyDescription that can be used for any "pretty print"
functionality or similar.

- We also removed method #topologyBuilder() from KStreamBuilder because
we think #transform() should provide all functionality you need to
mix-an-match Processor API and DSL. If there is any further concern
about this, please let us know.


On 2/14/17 9:59 AM, Matthias J. Sax wrote:
> You can already output any number of record within .transform() using
> the provided Context object from init()...
> -Matthias
> On 2/14/17 9:16 AM, Guozhang Wang wrote:
>>> and you can't output multiple records or branching logic from a
>> transform();
>> For output multiple records in transform, we are currently working on
>> https://issues.apache.org/jira/browse/KAFKA-4217, I think that should cover
>> this use case.
>> For branching the output in transform, I agree this is not perfect but I
>> think users can follow some patterns like "stream.transform().branch()",
>> would that work for you?
>> Guozhang
>> On Tue, Feb 14, 2017 at 8:29 AM, Mathieu Fenniak <
>> mathieu.fenniak@replicon.com> wrote:
>>> On Tue, Feb 14, 2017 at 1:14 AM, Guozhang Wang <wangguoz@gmail.com> wrote:
>>>> Some thoughts on the mixture usage of DSL / PAPI:
>>>> There were some suggestions on mixing the usage of DSL and PAPI:
>>>> https://issues.apache.org/jira/browse/KAFKA-3455, and after thinking it
>>> a
>>>> bit more carefully, I'd rather not recommend users following this
>>> pattern,
>>>> since in DSL this can always be achieved in process() / transform().
>>> Hence
>>>> I think it is okay to prevent such patterns in the new APIs. And for the
>>>> same reasons, I think we can remove KStreamBuilder#newName() from the
>>>> public APIs.
>>> I'm not sure that things can always be achieved by process() /
>>> transform()... there are some limitations to these APIs.  You can't output
>>> from a process(), and you can't output multiple records or branching logic
>>> from a transform(); these are things that can be done in the PAPI quite
>>> easily.
>>> I definitely understand a preference for using process()/transform() where
>>> possible, but, they don't seem to replace the PAPI.
>>> I would love to be operating in a world that was entirely DSL.  But the DSL
>>> is limited, and it isn't extensible (... by any stable API).  I don't mind
>>> reaching into internals today and making my own life difficult to extend
>>> it, and I'd continue to find a way to do that if you made the APIs distinct
>>> and split, but I'm just expressing my preference that you not do that. :-)
>>> And about printing the topology for debuggability: I agrees this is a
>>>> potential drawback, and I'd suggest maintain some functionality to build
>>> a
>>>> "dry topology" as Mathieu suggested; the difficulty is that, internally
>>> we
>>>> need a different "copy" of the topology for each thread so that they will
>>>> not share any states, so we cannot directly pass in the topology into
>>>> KafkaStreams instead of the topology builder. So how about adding a
>>>> `ToplogyBuilder#toString` function which calls `build()` internally then
>>>> prints the built dry topology?
>>> Well, this sounds better than KafkaStreams#toString() in that it doesn't
>>> require a running processor.  But I'd really love to have a simple object
>>> model for the topology, not a string output, so that I can output my own
>>> debug format.  I currently have that in the form of
>>> TopologyBuilder#nodeGroups() & TopologyBuilder#build(Integer).
>>> Mathieu

View raw message