beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <>
Subject [jira] [Commented] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
Date Tue, 27 Jun 2017 22:49:00 GMT


Eugene Kirpichov commented on BEAM-2140:

_we can't advance the watermark as though it was non-splittable in the unbounded case_ - why
is that / why is it a bad thing that the watermark of the PCollection being fed into the SDF
would not advance? E.g. imagine it's a Create.of(pubsub topic name) + ParDo(read pubsub forever)
- is it important to advance the watermark of the Create.of()?

Alternatively, imagine it's: read filepatterns from pubsub + TextIO.readAll().watchForNewFiles().watchFilesForNewEntries(),
which has several SDFs in this. Would there be a problem with advancing the watermark of the
PCollection of filepatterns only after the watch termination conditions of TextIO.readAll()
are hit and this filepattern is no longer watched?

Alternatively - worst case I guess: read Pubsub topic names from Kafka, and read each topic
forever. I'd assume that the user would be interested in advancement of the watermark of the
PCollection of pubsub records rather than the PCollection of Pubsub topic names? I'm not sure
the Pubsub topic names in Kafka would even need to have meaningful timestamps (rather than
infinite past).

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> -------------------------------------------------------
>                 Key: BEAM-2140
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled the tests
to unblock the open PR for BEAM-1763.

This message was sent by Atlassian JIRA

View raw message