beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Michels <...@apache.org>
Subject Re: Acknowledging Pubsub messages in Flink Runner
Date Tue, 11 Sep 2018 10:45:01 GMT
Hey Encho,

The Flink Runner acknowledges messages through PubSubIO's 
`CheckpointMark#finalizeCheckpoint()` method.

The Flink Runner wraps the PubSubIO source via the 
UnboundedSourceWrapper. When Flink takes a checkpoint of the running 
Beam streaming job, the wrapper will retrieve the CheckpointMarks from 
the PubSubIO source.

When the Checkpoint is completed, there is a callback which informs the 
wrapper (`notifyCheckpointComplete()`) and calls `finalizeCheckpoint()` 
on all the generated CheckpointMarks.

Hope that helps debugging your problem. I don't have an explanation why 
this doesn't work for the last records in your PubSub queue. It 
shouldn't make a difference for how the Flink Runner does checkpointing.

Best,
Max

On 10.09.18 18:17, Encho Mishinev wrote:
> Hello,
> 
> I am using Flink runner with Apache Beam 2.6.0. I was wondering if there 
> is information on when exactly the runner acknowledges a pubsub message 
> when reading from PubsubIO?
> 
> My problem is that whenever there are a few messages left in a 
> subscription my streaming job never really seems to acknowledge them 
> all. For example is a subscription has 100,000,000 messages in total, 
> the job will go through about 99,990,000 and then keep reading the last 
> few thousand and seemingly never acknowledge them.
> 
> Some clarity on when the acknowledgement happens in the pipeline might 
> help me debug this problem.
> 
> Thanks!

Mime
View raw message