beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amit Sela (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-649) Pipeline "actions" should use foreachRDD via ParDo.
Date Tue, 20 Sep 2016 19:06:21 GMT
Amit Sela created BEAM-649:
------------------------------

             Summary: Pipeline "actions" should use foreachRDD via ParDo.
                 Key: BEAM-649
                 URL: https://issues.apache.org/jira/browse/BEAM-649
             Project: Beam
          Issue Type: Bug
          Components: runner-spark
            Reporter: Amit Sela
            Assignee: Amit Sela


Spark will execute a pipeline ONLY if it's triggered by an action (batch) / output operation
(streaming) - http://spark.apache.org/docs/1.6.2/streaming-programming-guide.html#output-operations-on-dstreams.

Currently, such actions in Beam are mostly implemented via ParDo, and translated by the runner
as a Map transformation (via mapPartitions).

The runner overcomes this by "forcing" actions on untranslated leaves.
While this is OK, it would be better in some cases, e.g., Sinks, to apply the same ParDo translation
but with foreach/foreachRDD instead of foreachPartition/mapPartitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message