beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vilhelm von Ehrenheim <vonehrenh...@gmail.com>
Subject Re: TextIO.watchForNewFiles and Watermarks
Date Thu, 22 Feb 2018 08:16:22 GMT
Ok thanks! I think that makes sense. I just got into some strange problems
when mixing this with KafkaIO. I'll checkout the Watch transform.

// Vilhelm

On Wed, Feb 21, 2018 at 6:12 PM, Eugene Kirpichov <kirpichov@google.com>
wrote:

> Hi! Currently it's not configurable - the watermark is min(timestamp of
> pending files) where timestamp is simply time when the file was seen.
> However, it's implemented as a very thin wrapper on top of the Watch
> transform, see implementation of FileIO.matchAll(). You can use the Watch
> transform directly and write a PollFn that uses FileSystems.match() to list
> the files and computes file timestamps and watermark in the way appropriate
> to your use case.
>
> On Wed, Feb 21, 2018 at 7:29 AM Vilhelm von Ehrenheim <
> vonehrenheim@gmail.com> wrote:
>
>> Hi!
>> I have some problems with watermarks using the watchForNewFiles feature
>> in TextIO to read . The watermark is not progressing even though no new
>> files are added to the location. Does anyone know what the logic is behind
>> this watermark, and if it is possible to supply your own heuristic? I can't
>> find it anywhere in the source.
>>
>> Regards,
>> Vilhelm von Ehrenheim
>>
>>
>>

Mime
View raw message