beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amit Sela (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (BEAM-649) Cache any PCollection implementation if accessed more than once
Date Wed, 04 Jan 2017 19:14:58 GMT

     [ https://issues.apache.org/jira/browse/BEAM-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Amit Sela updated BEAM-649:
---------------------------
    Description: 
Currently, the runner will cache any {{PCollection}} implementation - {{RDD}} or {{DStream}}
- if accessed for the second time.
This can be further optimized to cache after the first evaluation, if accessed again, and
also solve issues in BEAM-1206.

  was:
Spark will execute a pipeline ONLY if it's triggered by an action (batch) / output operation
(streaming) - http://spark.apache.org/docs/1.6.2/streaming-programming-guide.html#output-operations-on-dstreams.

Currently, such actions in Beam are mostly implemented via ParDo, and translated by the runner
as a Map transformation (via mapPartitions).

The runner overcomes this by "forcing" actions on untranslated leaves.
While this is OK, it would be better in some cases, e.g., Sinks, to apply the same ParDo translation
but with foreach/foreachRDD instead of foreachPartition/mapPartitions.


> Cache any PCollection implementation if accessed more than once
> ---------------------------------------------------------------
>
>                 Key: BEAM-649
>                 URL: https://issues.apache.org/jira/browse/BEAM-649
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-spark
>            Reporter: Amit Sela
>            Assignee: Jean-Baptiste Onofré
>
> Currently, the runner will cache any {{PCollection}} implementation - {{RDD}} or {{DStream}}
- if accessed for the second time.
> This can be further optimized to cache after the first evaluation, if accessed again,
and also solve issues in BEAM-1206.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message