flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tzu-Li (Gordon) Tai" <tzuli...@apache.org>
Subject Re: Timestamp/watermark support in Kinesis consumer
Date Thu, 08 Feb 2018 10:16:51 GMT
Regarding the two hooks you would like to be available:

Provide hook to override discovery (not to hit Kinesis from every subtask)
Yes, I think we can easily provide a way, for example setting -1 for SHARD_DISCOVERY_INTERVAL_MILLIS,
to disable shard discovery.
Though, the user would then have to savepoint and restore in order to pick up new shards after
a Kinesis stream reshard (which is in practice the best way to by-pass the Kinesis API rate
limitations).
+1 to provide that.

Provide hook to support custom watermark generation (somewhere around KinesisDataFetcher.emitRecordAndUpdateState)
Per-partition watermark generation on the Kinesis side is slightly more complex than Kafka,
due to how Kinesis’s dynamic resharding works.
I think we need to additionally allow new shards to be consumed only after its parent shard
is fully read, otherwise “per-shard time characteristics” can be broken because of this
out-of-orderness consumption across the boundaries of a closed parent shard and its child.
There theses JIRAs [1][2] which has a bit more details on the topic.
Otherwise, in general I’m also +1 to providing this also in the Kinesis consumer.

Cheers,
Gordon

[1] https://issues.apache.org/jira/browse/FLINK-5697
[2] https://issues.apache.org/jira/browse/FLINK-6349

On 8 February 2018 at 1:48:23 AM, Thomas Weise (thw@apache.org) wrote:

Provide hook to override discovery (not to hit Kinesis from every subtask)
Provide hook to support custom watermark generation (somewhere around KinesisDataFetcher.emitRecordAndUpdateState)
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message