kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-3478) Finer Stream Flow Control
Date Thu, 01 Sep 2016 18:57:20 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15456310#comment-15456310

Matthias J. Sax commented on KAFKA-3478:

[~bbejeck] I just created https://issues.apache.org/jira/browse/KAFKA-4113 and https://issues.apache.org/jira/browse/KAFKA-4114.
Feel free to grab one :)

> Finer Stream Flow Control
> -------------------------
>                 Key: KAFKA-3478
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3478
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: streams
>            Reporter: Guozhang Wang
>              Labels: user-experience
>             Fix For:
> Today we have a event-time based flow control mechanism in order to synchronize multiple
input streams in a best effort manner:
> http://docs.confluent.io/3.0.0/streams/architecture.html#flow-control-with-timestamps
> However, there are some use cases where users would like to have finer control of the
input streams, for example, with two input streams, one of them always reading from offset
0 upon (re)-starting, and the other reading for log end offset.
> Today we only have one consumer config "offset.auto.reset" to control that behavior,
which means all streams are read either from "earliest" or "latest".
> We should consider how to improve this settings to allow users have finer control over
these frameworks.
> =====
> A finer flow control could also be used to allow for populating a {{KTable}} (with an
"initial" state) before starting the actual processing (this feature was ask for in the mailing
list multiple times already). Even if it is quite hard to define, *when* the initial populating
phase should end, this might still be useful. There would be the following possibilities:
>  1) an initial fixed time period for populating
>    (it might be hard for a user to estimate the correct value)
>  2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
>  3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
>  4) a throughput threshold, ie, if the populating frequency falls below
> the threshold, the KTable is considered "finished"
>  5) maybe something else ??
> The API might look something like this
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table without reading any
other topics until see one record with timestamp 1000.
> {noformat}

This message was sent by Atlassian JIRA

View raw message