beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Wegner (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-2450) Transform names and named applications should not be null or empty
Date Wed, 14 Jun 2017 20:48:00 GMT
Scott Wegner created BEAM-2450:
----------------------------------

             Summary: Transform names and named applications should not be null or empty
                 Key: BEAM-2450
                 URL: https://issues.apache.org/jira/browse/BEAM-2450
             Project: Beam
          Issue Type: Bug
          Components: beam-model, sdk-java-core, sdk-py
            Reporter: Scott Wegner
            Assignee: Frances Perry
            Priority: Minor


Beam SDK allows setting the name of a transform [1] and also naming the transform application
[2]. If no name is specified on application, the name of the transform is used. If no name
is specified for the transform, the class name is used.

The application name serves as metadata for the applied PTransforms in the constructed graph.
The are effectively extra display data (historically, PTransform names predate display data).
The names are used by runners for UI and monitoring applications, such as the displayed pipeline
graph in the Dataflow Monitoring UI [3].

Currently there is no explicit validation on the specified application name. The current behavior
seems to be:
* null application names cause a NullPointerException at construction time.
* Specifying the empty string compiles and succeeds in the DirectRunner, but causes strange
behavior in Dataflow when rendering the graph in the UI. I have not tested the behavior of
other runners.

We should add explicit validation in the model on the specified transform name and application
name. I propose that we disallow null and empty names.

This is technically a breaking change as the SDK currently allows the empty string, but only
because it is under-specified. The upgrade path for any pipelines broken by this change is
simple: specify a non-empty name or fallback to the default class name.

[1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L236
[2] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollection.java#L295
[3] https://cloud.google.com/dataflow/pipelines/dataflow-monitoring-intf#viewing-a-pipeline



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message