beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-1283) DoFn.Context.output spec for startBundle/finishBundle is a mess
Date Thu, 19 Jan 2017 21:02:26 GMT
Kenneth Knowles created BEAM-1283:
-------------------------------------

             Summary: DoFn.Context.output spec for startBundle/finishBundle is a mess
                 Key: BEAM-1283
                 URL: https://issues.apache.org/jira/browse/BEAM-1283
             Project: Beam
          Issue Type: Bug
          Components: beam-model, sdk-java-core
            Reporter: Kenneth Knowles
            Assignee: Kenneth Knowles


The spec is here in Javadoc: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L128

"If invoked from {{@StartBundle}} or {{@FinishBundle}}, this will attempt to use the {{WindowFn}}
of the input {{PCollection}} to determine what windows the element should be in, throwing
an exception if the {{WindowFn}} attempts to access any information about the input element.
The output element will have a timestamp of negative infinity."

This is a collection of caveats that make this method not technically wrong, but quite a mess.
Ideas that reasonable folks have suggested lately:

 - The {{WindowFn}} cannot actually be applied because {{WindowFn}} is allowed to see the
element type. The spec just avoids this by limiting which {{WindowFn}} can be used.
 - There is no natural output timestamp, so it should always be provided. The spec avoids
this by specifying an arbitrary and fairly useless timestamp.

The use cases for these methods are best addressed by state plus window expiry callback, so
we should revisit this spec and probably just wipe it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message