beam-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Beam JIRA Bot (Jira)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-11366) Dataflow and KafkaIO or KinesisIO has erratic watermark progress due to ReaderCache timeouts
Date Sat, 23 Jan 2021 17:14:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-11366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270711#comment-17270711
] 

Beam JIRA Bot commented on BEAM-11366:
--------------------------------------

This issue was marked "stale-assigned" and has not received a public comment in 7 days. It
is now automatically unassigned. If you are still working on it, you can assign it to yourself
again. Please also give an update about the status of the work.

> Dataflow and KafkaIO or KinesisIO has erratic watermark progress due to ReaderCache timeouts
> --------------------------------------------------------------------------------------------
>
>                 Key: BEAM-11366
>                 URL: https://issues.apache.org/jira/browse/BEAM-11366
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Sam Whittle
>            Priority: P2
>
> The ReaderCache has a default timeout of 1 minute. If the source is not polled within
that time, the UnboundedReader is destroyed and will be reinitilized from the checkpoint when
next polled. This is too short for runs where the source is not polled for a while due to
other keys to process. In particular this seems to affect Streaming engine pipelines with
backlog.
> From an initial look, recovery from the Kafka UnboundedReader checkpoint does not seem
to use the previous watermark for initializing the timestamp policy and perhaps the timestamp
policy will return unknown watermark if it has not yet observed a record.
> So possible fixes would be:
> - make cache timeout configurable, or maybe dynamic, and increase it
> - ensure that KafkaIO and KinesisIO recovery from checkpoints initializes the watermark
properly so that cache eviction does not matter for watermark smoothness
> - ensure that cache eviction happens in the background instead of blocking threads acquiring
readers which might not expect blocking



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message