beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "holdenk (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-3290) Construct iterators directly if possible to allow spilling to disk
Date Tue, 05 Dec 2017 12:56:00 GMT
holdenk created BEAM-3290:
-----------------------------

             Summary: Construct iterators directly if possible to allow spilling to disk
                 Key: BEAM-3290
                 URL: https://issues.apache.org/jira/browse/BEAM-3290
             Project: Beam
          Issue Type: Improvement
          Components: runner-spark
            Reporter: holdenk
            Assignee: Amit Sela


When you construct a collection first and convert it to an iterator you force Spark to evaluate
the entire input partition before it can get the first element off the output. This breaks
some of the spilling to disk Spark can do otherwise. Instead chain operations on Iterators.

This is only possible in the Java API for Spark 2 and above (and that's my fault from back
in my work in the Spark project).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message