beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amit Sela (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (BEAM-1250) Remove leaf when materializing PCollection to avoid re-evaluation.
Date Sat, 07 Jan 2017 09:09:58 GMT

     [ https://issues.apache.org/jira/browse/BEAM-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Amit Sela closed BEAM-1250.
---------------------------
       Resolution: Fixed
    Fix Version/s: 0.5.0

> Remove leaf when materializing PCollection to avoid re-evaluation.
> ------------------------------------------------------------------
>
>                 Key: BEAM-1250
>                 URL: https://issues.apache.org/jira/browse/BEAM-1250
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Amit Sela
>            Assignee: Amit Sela
>             Fix For: 0.5.0
>
>
> When materializing a {{PCollection}} (implemented as {{RDD}}), to create a {{PCollectionView}}
for example, the runner should remove the materialized {{RDD}} from the "leaves" set.
> The runner keeps track of leaves left un-handled in the DAG to force action on them -
{{Write}} for one is implemented via a sequence of ParDos which are implemented by the runner
via {{mapPartitions}} so we need to force an action.
> Materializing an {{RDD}} is done via the action {{collect()}} so no reason to keep in
"leaves" set.
> Currently, it remains in the "leaves" set and so it is forced and evaluates the lineage
and if not cached it will execute twice the lineage twice (unless caches are applied for some
reason).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message