beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <>
Subject [jira] [Commented] (BEAM-2447) Reintroduce DoFn.ProcessContinuation
Date Tue, 27 Jun 2017 00:53:00 GMT


Eugene Kirpichov commented on BEAM-2447:

If the runner did not already take a checkpoint, and the DoFn returns resume(), and the runner
treats it as stop(), then the DoFn won't be resumed even though it should.

Perhaps you meant treat everything as resume()? In that case, the DoFn will never terminate,
even if it's done with the restriction, because it'll constantly be resumed and will keep
producing empty checkpoints.

> Reintroduce DoFn.ProcessContinuation
> ------------------------------------
>                 Key: BEAM-2447
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
> ProcessContinuation.resume() is useful for tailing files - when we reach current EOF,
we want to voluntarily suspend the process() call rather than wait for runner to checkpoint
> In BEAM-1903, DoFn.ProcessContinuation was removed because there was ambiguity about
the semantics of resume() especially w.r.t. the following situation described in
: the runner has taken a checkpoint on the tracker, and then the ProcessElement call returns
resume() signaling that the work is still not done - then there's 2 checkpoints to deal with.
> Instead, the proper way to refine this semantics is:
> - After checkpoint() on a RestrictionTracker, the tracker MUST fail all subsequent tryClaim()
calls, and MUST succeed in checkDone().
> - After a failed tryClaim() call, the ProcessElement method MUST return stop()
> - So ProcessElement can return resume() only *instead* of doing tryClaim()
> - Then, if the runner has already taken a checkpoint but tracker has returned resume(),
we do not need to take a new checkpoint - the one already taken already accurately describes
the remainder of the work.

This message was sent by Atlassian JIRA

View raw message